Accessibility settings

Published on in Vol 28 (2026)

This is a member publication of University of Bristol (Jisc)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/85414, first published .
Multimodal AI for Alzheimer Disease Diagnosis: Systematic Review of Datasets, Models, and Modalities

Multimodal AI for Alzheimer Disease Diagnosis: Systematic Review of Datasets, Models, and Modalities

Multimodal AI for Alzheimer Disease Diagnosis: Systematic Review of Datasets, Models, and Modalities

Authors of this article:

Ziwen Yu1 Author Orcid Image ;   Anthony Mulholland1 Author Orcid Image ;   Tianyan Huang2 Author Orcid Image ;   Qiang Liu1 Author Orcid Image

1School of Engineering Mathematics and Technology, University of Bristol, Tankard's Close, Ada Lovelace Building, Bristol, United Kingdom

2Medical Physics and Biomedical Engineering, University College London, London, United Kingdom

*these authors contributed equally

Corresponding Author:

Qiang Liu, PhD


Background: Early detection of Alzheimer disease (AD) is essential for timely intervention; yet, diagnostic performance varies widely across modalities and datasets. Recent multimodal artificial intelligence (AI) models have made significant progress, but the evidence base remains fragmented due to heterogeneous datasets, modeling frameworks, and reporting quality.

Objective: This systematic review aimed to analyze studies on multimodal AI models for AD diagnosis, prognosis, and risk prediction over 5 years. We evaluated dataset characteristics, modality combinations, modeling strategies, performance metrics, and methodological limitations. We further discuss real-world implications and translational pathways.

Methods: Following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 guidelines, we systematically searched PubMed, IEEE Xplore, Scopus, ACM Digital Library, Cochrane, and arXiv, with the final datasets last searched on November 15, 2025. Studies applying multimodal machine learning or deep learning to AD, mild cognitive impairment, and dementia outcomes were included, whereas studies using a single modality or lacking sufficient methodological detail were excluded. QUADAS-2 (Revised Quality Assessment of Diagnostic Accuracy Studies tool) assessed risk of bias. Extracted performance results were synthesized across 4 major multimodal dataset families.

Results: A total of 66 studies met the inclusion criteria. Across datasets, multimodal models consistently outperformed single-modal baselines. Alzheimer’s Disease Neuroimaging Initiative–based diagnosis achieved an average accuracy of 92.5% (SD 3.8%), while mild cognitive impairment–conversion models achieved an average area under the curve (AUC) of 0.922 (SD 0.045), and several fusion architectures reported AUCs above 0.95. In contrast, UK Biobank risk-prediction studies reported an average AUC of 0.84 (SD 0.056), and this reflects performance in large, population-based datasets. DementiaBank speech-language studies achieved an average AUC of 0.813 (SD 0.042), and cross-lingual AD detection achieved an accuracy of 77% (SD 6.5%). Self-collected multimodal datasets demonstrated average accuracies around 96% (SD 2.4%), but their generalizability is limited due to small sample sizes and single-center designs.

Conclusions: This systematic review demonstrates that multimodal AI models consistently outperform single-modal models for AD diagnosis, prognosis, and risk prediction by integrating complementary biological, clinical, and behavioral information. Unlike prior reviews, this review provides a unified synthesis across heterogeneous clinical, imaging, genetic, and linguistic datasets, enabling cross-domain comparison of modeling strategies and performance. However, the generalizability of reported performance was limited due to substantial heterogeneity in dataset composition, outcome definitions, and validation, and prevalent risks of bias. By evaluating these factors, this review clarifies where current evidence is robust and where caution is warranted. The findings highlight the need for standardized multimodal benchmarks, transparent evaluation protocols, and clinically grounded model design to enable reliable real-world deployment. Overall, this work advances the field by framing multimodal AI not only as a performance-driven tool but also as a translational framework for equitable, interpretable, and scalable AD diagnosis.

Trial Registration: PROSPERO CRD420251241895; https://www.crd.york.ac.uk/PROSPE-RO/view/CRD420251241895

J Med Internet Res 2026;28:e85414

doi:10.2196/85414

Keywords



Alzheimer disease (AD) is the most prevalent neurodegenerative disorder and the leading cause of dementia worldwide [1]. With an aging global population, AD has become one of the most costly and deadly diseases of the 21st century, imposing profound emotional, financial, and caregiving burdens on patients, families, and health systems. By 2050, the number of people with AD is projected to rise from 55 million in 2020 to approximately 139 million [1].

The progression of AD includes the preclinical stage, mild cognitive impairment (MCI), and symptomatic stages, with varying degrees of symptom severity. The preclinical stage is a key window for intervention, during which neuropathological changes have commenced, but clinical symptoms remain largely undetectable [2]. Despite advances in awareness and screening, up to 75% of dementia cases remain undiagnosed worldwide, particularly in low- and middle-income countries [3]. This persistent diagnostic gap highlights the need for low-cost, scalable, and accurate early detection tools to enable timely intervention and slow disease progression [4].

Artificial intelligence (AI) has emerged as a promising approach for improving the early detection and management of AD. By systematically integrating and analyzing multimodal data, AI-based diagnostic frameworks offer powerful tools to enhance early detection accuracy and facilitate timely intervention.

Recent work has used transformer-based models to integrate imaging, genetic, and linguistic data. Multimodal transformers combining magnetic resonance imaging (MRI) or positron emission tomography (PET) with clinical features and cognitive assessments have reported improved diagnostic accuracy and interpretability [5-7]. In parallel, GPT-style architectures, BERT (Bidirectional Encoder Representations From Transformers) variants, and domain-adapted language models improve extraction of linguistic and semantic markers linked to early cognitive decline [8,9]. Self-supervised speech models also perform strongly for detecting MCI and early AD from spontaneous speech [10]. Together, these advances reflect a shift toward unified, more interpretable, and clinically translatable multimodal systems that capture both biological and behavioral aspects of AD.

Traditional machine learning (ML) [11], ensemble methods [12], deep learning [13], and reinforcement learning (RL) [14] can perform well on unimodal data, but clinical diagnosis integrates structural and behavioral information [15]. Unimodal AI can therefore diverge from clinical workflows and miss complementary signals (eg, MRI for structural change plus speech features for cognitive decline [16]), increasing the risk of modality-specific overfitting and poorer real-world performance. Accordingly, recent work has shifted toward multimodal integration for AD diagnosis, yet many studies emphasize incremental accuracy gains while underaddressing generalizability, interpretability, and cost-effectiveness needed for adoption. The literature also remains fragmented: recent reviews often cover multimodal clinical phenotyping datasets [17] and multimodal linguistic cognitive-impairment datasets [18] separately, obscuring cross-modal insights such as how imaging and speech biomarkers might jointly improve early detection.

Recent multimodal methods have substantially improved AD detection. However, a comprehensive systematic review that integrates evidence across both clinical and linguistic modalities, fusion strategies, and critically evaluates methodological quality, dataset diversity, and reporting transparency is still lacking. To address these gaps, this review investigates how multimodal models are applied to AD diagnosis, prognosis, and risk prediction and compares performance across different modality combinations and dataset families published between 2019 and 2025. We also examine modeling and fusion strategies alongside validation practices and assess methodological quality and risk of bias using QUADAS-2 (Revised Quality Assessment of Diagnostic Accuracy Studies tool). Furthermore, key multimodal combinations within public datasets are analyzed in relation to their diagnostic performance, and datasets are categorized to evaluate their suitability for AD research and clinical translation. Overall, this review provides a comprehensive synthesis of multimodal AI in AD diagnosis, bridges previously disconnected research streams, and offers practical guidance for future model development and clinical adoption.


Study Design

This review was conducted in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 guidelines [19], with the search procedures reported following PRISMA-S (Preferred Reporting Items for Systematic Reviews and Meta-Analyses literature search extension) [20] and developed using the principles outlined in the Cochrane Handbook [21]. These methods were applied to systematically identify and evaluate studies on computer-aided AD diagnosis, with a particular focus on those using multimodal clinical phenotyping datasets and multimodal linguistic-based cognitive impairment datasets.

Source of the Study and Search Criteria

We developed and internally reviewed independent search strategies (no external peer review). We manually searched multiple databases to identify AI-driven multimodal approaches for AD diagnosis, rather than using an integrated multidatabase platform. As this review targets methodological advances mainly reported in peer-reviewed computational literature, we did not search trial registries (ClinicalTrials.gov, World Health Organization’s International Clinical Trials Registry Platform). We also avoided validated or published filters, instead iteratively refining customized controlled-vocabulary and free-text terms for AD or dementia, multimodal data, and AI through pilot screening to maximize sensitivity.

Searches were performed in PubMed (447 records; January 1, 2019, to November 13, 2025), Scopus (1086 records; all years through November 13, 2025, filtered to PUBYEAR > 2018), IEEE Xplore (2229 records; January 1, 2020 to November 13, 2025), ACM Digital Library (2067 records; 1 January 2020, to 15 November 2025), Cochrane Library (1061 records; all available years through November 15, 2025), and arXiv (1081 records; all available years through November 15, 2025). We included the verbatim search strings for all databases, and because arXiv does not support bulk export, an arXiv search Python (Python Software Foundation) script is provided in Multimedia Appendix 1 [11,14,22-44].

Eligibility Criteria

The inclusion criteria were if studies were considered eligible if they met all the following conditions: (1) focused on AD, MCI, or related dementias as the primary clinical outcome; (2) applied AI or ML methods for computer-aided diagnosis, classification, or prediction; (3) used multimodal data, defined as any combination of at least two distinct modalities (eg, neuroimaging, clinical phenotyping, genetics, or linguistic features); (4) reported quantitative evaluation metrics; and (5) written in English.

The exclusion criteria were if studies met any of the following conditions: (1) single-modal approaches using only a single imaging modality, cognitive test, or biomarker, without any multimodal integration; (2) works without reported performance metrics or with insufficient methodological detail; (3) works not addressing diagnosis, classification, or prediction (eg, treatment response, drug trials, and lifestyle interventions); (4) duplicate publications or overlapping datasets without providing additional methodological contribution; and (5) non-English publications.

Selection Process

The study selection process followed the PRISMA 2020 guidelines, and the protocol was registered. The final search update was conducted in November 2025. All records retrieved from the databases were first imported into Zotero, where duplicates were automatically detected and removed.

The initial search identified 7435 records. After removing 3047 duplicates, 4388 records remained for title and abstract screening. A total of 4021 records were obviously irrelevant at the title and abstract level, 252 studies were excluded for the following main reasons:

  • Focused on outcomes unrelated to AD diagnosis, classification, or prediction (eg, drug trials, treatment response, lifestyle interventions; n=140).
  • Used unimodal data without multimodal integration (n=46).
  • No sufficient methodological details (n=47).

Finally, 66 studies were included in the systematic synthesis, and all were successfully retrieved (reports not retrieved=0).

Overview of AI-Assisted AD Diagnosis

The workflow of AI-assisted AD diagnosis involves 3 stages, as illustrated in Figure 1. The initial stage involves comprehensive data acquisition, where information is collected from multiple modalities, including neuroimaging, biomarkers, genetics, and speech or behavioral signals. The second stage involves feature extraction and model development, followed by an interpretable and explainable analysis to ensure that AI models can effectively support clinical decision-making.

Figure 1. Overview of the AI pipeline for AD diagnosis. Multimodal inputs undergo preprocessing and feature extraction before model training for classification or regression tasks. Model interpretability supports explanation and performance evaluation. AD: Alzheimer disease; AI: artificial intelligence; LIME: Local Interpretable Model-Agnostic Explanations; SHAP: Shapley Additive Explanations; XAI: explainable artificial intelligence.

Performance Evaluation Metrics

To ensure the clinical applicability and scientific rigor of computer-aided diagnosis models for AD, it is essential to systematically evaluate their performance using a variety of quantitative metrics. We have summarized all performance evaluation metrics in Multimedia Appendix 2.

Risk of Bias and Quality Assessment

We assessed methodological quality with QUADAS-2 [45], evaluating risk of bias in 4 domains. Patient selection raised the main concern: 61% (40/66) of outcomes were high risk due to poor reporting or nonrepresentative sampling. For the index test, 76% (50/66) were unclear risk because procedures and decision thresholds were insufficiently described, and 20% (13/66) were high risk. The reference standard showed 76% (50/66) unclear risk from limited methodological detail, with no high-risk ratings. Flow and timing had the greatest uncertainty: 85% (56/66) were unclear owing to missing information on testing intervals and participant flow. Figure 2 summarizes domain-level risk distributions; Multimedia Appendix 3 reports study-level assessments.

Figure 2. Summary of the QUADAS-2 plot across the 66 included studies in the domain. QUADAS-2: Revised Quality Assessment of Diagnostic Accuracy Studies tool.

Given frequent unclear and high risk in key domains, we interpreted diagnostic performance cautiously, especially without external validation or a clearly defined reference standard. Future benchmarking should emphasize transparent reporting, prespecified thresholds, and multicenter evaluation to reduce bias and improve reproducibility.


Overview

Following study selection (the complete selection process is summarized in the PRISMA 2020 flow diagram, Figure 3), we first summarize the overall profile of the included literature to contextualize the subsequent synthesis. Figure 4 provides a temporal overview (2019‐2025) of modeling approaches across included studies, illustrating how methodological focus has shifted over time and informing interpretation of the evidence base.

Figure 3. Flow diagram of PRISMA. AD: Alzheimer disease; PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses.
Figure 4. Temporal trends of machine-learning methods used for AD diagnosis (2019‐2025). AD: Alzheimer disease.

Single Modality

Overview

If readers are already familiar with traditional ML and deep learning approaches, they may wish to proceed directly to the next section, which focuses on multimodal data integration for AD diagnosis.

A concise overview of these baseline methods is provided in Multimedia Appendix 4, which also includes a summary of RL. As most RL studies address sequential decision-making tasks rather than direct diagnostic modeling, their methodological details are presented in Multimedia Appendix 4, to maintain focus on multimodal diagnostic frameworks in the main text.

Deep Learning

Compared with traditional ML, deep learning enables hierarchical feature extraction, capturing complex patterns in high-dimensional data. It is therefore widely used to process and integrate AD-related multimodal inputs, including neuroimaging, clinical scores, genetics, and speech. Key approaches and findings are summarized below.

Recurrent neural networks are effective for modeling sequential data such as longitudinal clinical records and speech signals, but they are susceptible to vanishing gradients in long sequences [46-49]. Long short-term memory networks address this limitation through gated memory mechanisms, enabling more stable training and improved capture of long-term dependencies. Consequently, long short-term memory models have been widely applied in AD research for analyzing temporal and sequential modalities [50-53].

The transformer model, which leverages attention mechanisms, dynamically assigns different weights to input features based on their relative importance. Each layer of the transformer consists of multiple attention heads, allowing the model to capture diverse feature representations by attending to various aspects. Transformer models use attention mechanisms to weight input features and capture diverse representations through multihead attention, enabling efficient and scalable training [54]. Owing to these advantages, they have been widely adopted in AD diagnosis and multimodal learning, where their encoder-decoder architecture facilitates effective integration of heterogeneous data sources [13,55-57].

Ensemble Learning

Ensemble learning improves generalization and robustness by combining multiple base models, including bagging and boosting methods such as AdaBoost (Adaptive Boosting), XGBoost (Extreme Gradient Boosting), and LightGBM (Light Gradient-Boosting Machine), and has been widely applied in AD detection and progression prediction [12,58-60]. However, ensemble models may introduce redundant features, offer limited gains on small datasets, and incur higher computational costs, which can restrict real-time or resource-constrained deployment.

Summarization for Single Modality

Traditional single-modality ML approaches can achieve high performance in AD-related tasks; however, they are constrained by several inherent limitations:

First, regarding information completeness, structural MRI alone has limited sensitivity to functional and molecular changes and cannot fully capture AD-related cognitive and behavioral alterations. Combining MRI with PET, neuropsychological tests, speech, electroencephalography (EEG), and genetic or biomarker data provides a more complete, multidimensional view of disease progression and patient heterogeneity [61].

Second, regarding model robustness, in multimodal data, residual noise in one modality may persist despite denoising, but other modalities can provide complementary signals that improve robustness. Leveraging multisensory-style integration, multimodal models better reflect biological cognition and can yield more reliable decisions [62].

Third, in cross-modal learning, transformer architectures use cross-modal attention to learn associations between modalities. Some studies apply them in weakly supervised or cross-modal guided settings, using one modality to constrain or guide representation learning in another [63].

Fourth, in real-world decision-making, multimodal learning better matches real-world diagnosis, which integrates multiple information sources. Using diverse modalities aligns models with clinical workflows and improves translational potential in practice [64].

Therefore, this review analyzes the methodological strengths and limitations of multimodal models for computer-assisted AD diagnosis, focusing on how dataset grouping and classification choices affect evaluation. By classifying datasets, we enable model comparisons under a unified setup, allowing for more direct assessment of generalizability and cross-dataset stability.

Multimodal Data

Multimodal Dataset Overview

High-quality data plays a pivotal role in training AI models for computer-aided diagnosis and detection. Robust datasets not only enhance the generalization ability of models but also help mitigate the risk of overfitting. Commonly used datasets for AI-assisted diagnosis of AD can be broadly categorized into 2 types. The first type of multimodal clinical phenotyping dataset, such as the Alzheimer’s Disease Neuroimaging Initiative (ADNI), UK Biobank, and the Open Access Series of Imaging Studies (OASIS), focuses on neuroimaging modalities, including MRI, functional MRI, genetic data, and electronic health records. The second type—multimodal cognitive-linguistic behavioral dataset centers on sequential data modalities, including audio, video, and transcribed language data, such as the Pitt Corpus and ADReSS (Alzheimer’s Dementia Recognition Through Spontaneous Speech). Each dataset type provides unique features that contribute to the comprehensive modeling of AD progression and diagnosis. The commonly used datasets, along with their population demographics and associated modalities, are summarized in Table 1.

Table 1. Commonly used dataset.
DatasetPopulation demographicsModalitiesLink
MaleAge (years)
UK Biobank23,000 (46)a56.5 (8.1)bMRIc, fMRId, genetic, lifestyle scores, activity monitor, and EHRe[65]
ADNIf
ADNI-1469 (57.3)a75 (6.9)bMRI, PETg, genetic, and EHR[66]
ADNI-GOh and 2473 (53)a72.5 (7.2)bMRI, PET, genetic, and EHR[66]
ADNI-3471 (49)a74.9 (8.1)bMRI, PET, genetic, and EHR[66]
OASISi
OASIS-1177 (42.5)a57 (39)bMRI, PET, CT, genetic, and EHR[67]
OASIS-260 (40)a78 (18)bMRI, PET, CT, genetic, and EHR[67]
OASIS-3622 (45)a69 (26.56)bMRI, PET, CT, genetic, and EHR[67]
OASIS-4663 subjects57.5 (36.5)bMRI, PET, CT, genetic, and EHR[67]
NACCj23,625 (44.62)a73.3 (10.5)bMRI, PET, genetics, and EHR[68]
FHSk718 (42)a80.76 (8.2)bMRI, genetic, and EHR[69]
AIBLl289 (43.72)a73.5 (7.03)bMRI, PET, genetic, and EHR[70]
Dementia bank
Pitt corpusHCm: 104, ADn: 208/552oAudio and text[71]
ADReSSpHC: 78, AD: 78/156Audio and text[71]
ADReSSoqHC: 115, AD: 122/237Audio and text[71]
ADReSS-MrHC: 143, AD: 148/291Audio and text[71]
TAUKADIALs106/507Audio and text[71]
Multimodal dementia corpusHC: 10, AD: 12/816Audio, typed, and hand-written[72]
ADReFVtAD: 2566.68 (2.08)bVideo[73]
GENCODEuHuman: 78,686Genetics[74]

an (%).

bMean (SD).

cMRI: magnetic resonance imaging.

dfMRI: functional magnetic resonance imaging.

eEHR: electronic health record.

fADNI: Alzheimer’s Disease Neuroimaging Initiative.

gPET: positron emission tomography.

hADNI-GO: Alzheimer’s Disease Neuroimaging Initiative – Grand Opportunity.

iOASIS: Open Access Series of Imaging Studies.

jNACC: National Alzheimer’s Coordinating Centre.

kFHS: Framingham Heart Study.

lAIBL: Australian Imaging, Biomarkers and Lifestyle Study.

mHC: health control.

nAD: Alzheimer disease.

oNot available.

pADReSS: Alzheimer’s Dementia Recognition Through Spontaneous Speech.

qADReSSo: Alzheimer’s Dementia Recognition Through Spontaneous Speech 2021 Challenge.

rADReSS-M: Multilingual Alzheimer’s Dementia Recognition Through Spontaneous Speech Challenge.

sTAUKADIAL: Speech-Based Cognitive Assessment in Chinese and English.

tADReFV: Alzheimer’s Disease Recognition from Face & Voice.

uGENCODE: gene code.

To better understand current research trends, this paper reviewed studies from the past 5 years on multimodal AI models for AD diagnosis. The literature was categorized by dataset type. (1) Multimodal clinical phenotyping datasets: ADNI dominates this category, used in about 80% of studies, while others, such as UK Biobank, OASIS, National Alzheimer’s Coordinating Centre (NACC), Framingham Heart Study, and Australian Imaging, Biomarkers and Lifestyle Study, account for the remaining 20%, mainly for supplementary analysis or external validation. (2) Multimodal cognitive-linguistic behavioral datasets: The ADReSS series is most widely used, representing around 70% of studies. The remaining 30% use datasets often for complementary analysis or benchmarking.

Multimodal Clinical Phenotyping Datasets

Multimodal clinical phenotyping datasets integrate MRI, PET, or diffusion tensor imaging, biomarkers from blood, cerebrospinal fluid, or genomics, and standardized cognitive assessments. This review summarizes representative resources, highlighting their modalities, distinguishing features, and contributions to diagnostic modeling (Table 2).

Table 2. Multimodal clinical phenotyping datasets related paper. The exceptionally high performances reported in some of these studies can be attributed to specific methodological factors: (1) for two studies [75] and [76], the absence of external validation likely inflated the results; (2) for another study [77], the use of a small but highly controlled dataset, extensive sample expansion, multimodal feature fusion, and pronounced disease-related electrophysiological signatures contributed to the elevated accuracy; and (3) for yet another study [78], the integration of rich gait features with optimized machine learning techniques in a controlled experimental setting facilitated unusually high performance.
StudyDatasetsModel typeType of taskModalitiesOutcomesValidationResultsLimitation
Xue et al [79], 2024NACCa, ADNIb, AIBLc, FHSd, PPMIe, and OASISfTransformer-based multimodal modelDifferential diagnosis of 10 etiologiesMRIg (T1, T2, FLAIRh), clinical, neuropsychological tests, and PETiDifferential diagnosis probabilities and ADj/MCIk/NCl classificationInternal: NACC held out; external: ADNI and FHSEtiology classification AUROCm 0.96; strong alignment with PET biomarkers and neuropathologyImbalanced etiologies, training label subjectivity, and limited racial diversity
Shi et al [80], 2018ADNIMM-SDPNnClassification (AD vs NC, MCI vs NC)T1 MRI and FDG-PEToClassification accuracyComparative vs single-modality and state-of-the-art modelsOutperformed single-modality DPNp/SDPNq and concatenated modelsADNI-only dataset (limits generalizability); ROIr-based features rather than voxel-wise
Allwright et al [81], 2023UK BiobankXGBoostsRisk prediction (Incident AD)Demographics, lifestyle, genetics, and medical historyPrediction of incident AD (2-10 years) and risk factor rankingInternal: nested 3-fold cross-validation; external: held-out validation setAUROC 0.77, APOE-ε4t identified as the strongest risk factor, and liver enzymes or frailty as predictorsICD-10u underascertainment, healthy volunteer bias, and observational design
Gu et al [82], 2025UK BiobankLightGBMvRisk prediction (incident dementia in ASCVDw patients)Clinical, biological assays, cognitive tests, and physical measuresAll-cause incident dementia, AD, and VDx incidenceTemporal: train (2006-2009) and test (2010 cohort)5-year dementia AUCy 0.903, AD AUC 0.775, and accuracy 0.851Sample mostly European descent, static baseline features, and potential overfitting
You et al [83], 2022UK BiobankLightGBMRisk prediction (5/10-year horizon)Demographics, lifestyle, blood biomarkers, and geneticsIncident all-cause dementia and AD predictionInternal: 5-fold cross-validationAUC 0.848 (all-cause), 0.862 (AD), and outperformed CAIDEz and DRSaaLimited external validation, population predominantly White, and feature selection fully data-driven
Calvo et al [84], 2024UK BiobankMultivariable logistic regressionRisk association analysisQuestionnaire, ICDab records, and genotypesOdds of AD related to menopause typeSingle cohort: multivariable adjustmentEarly bilateral oophorectomy associated with 4-fold AD odds (ORac 4.12) and HTad use protectiveLow case numbers in subgroups; self-reported HT use and healthy volunteer bias
Yi et al [85], 2025UK Biobank, ADNI, PPMI, and IXIae3D-ViTafBAGag estimation and GWASahT1-weighted MRI, genetics (SNPai and xQTLaj)BAG and drug target prioritizationExternal: ADNI, PPMI, and IXIMAEak ≈ 2.6 and identified 7 high-confidence drug targets (eg, MAPTal and TNFSF12am)European-ancestry bias; lack of biological “ground truth” for brain age
Yousefzadeh et al [86], 2024UK Biobank (retina cohort)VGG-16an classifier + LAVAao (XAIap)Binary classification and explainabilityRetinal fundus imagesAD vs NC classification and neuron-level explanationsInternal: nested 5-fold cross-validationAccuracy 71.4% and identified 7 latent clusters linking vascular and cognitive declineSmall AD sample size (n=100), cross-sectional design, and UK Biobank volunteer bias
Gong et al [87], 2023UK BiobankSuperBigFLICA (semisupervised Bayesian fusion)Phenotype discoveryMultimodal MRI (47 modalities)Latent components predictive of nonimaging phenotypesInternal: train, validation, or test splitUp to 46% improvement over expert IDPsaq and interpretable multimodal modesLinear modeling constraints and UK Biobank population bias
Lian et al [88], 2022ADNI-1, ADNI-2, and AIBLAttention-guided HybNet (3D FCNar + hybrid network)Diagnosis and prognosisStructural T1 MRIAD vs NC classification; pMCIas vs sMCIat predictionExternal: trained ADNI-1, and validated ADNI-2 and AIBLADNI-2 AD vs NC accuracy 0.919 (AUC 0.965) and outperformed ROI/VBMau methodsHeavy preprocessing reliance, limited demographic diversity, and potential overfitting
Lian et al [89], 2022ADNI-1 and ADNI-2MWANavJoint regression of clinical scoresStructural T1 MRI and clinical scoresMMSEaw, CDRSBax, and ADAS-Cogay predictionCross-validation: across ADNI-1 and ADNI-2Lower RMSEaz and higher correlation coefficients than single-task baselinesRestricted to the ADNI cohorts and potential overfitting to the modest sample size
Li et al [90], 2019ADNI-1, ADNI-GO/2ba, and AIBL3D CNNbb + Cox proportional hazardsTime-to-event prognosisHippocampal MRI patches and clinical variablesProgression from MCI to AD and risk stratificationExternal: trained ADNI-1, and validated ADNI-GO/2 and AIBLC-index 0.864 (combined model) and significant risk-based stratification of MCIFocus on the hippocampus only and potential cohort and scanner bias (1.5T vs 3T)
Qiu et al [15], 2022NACC, ADNI, and ADCPbcMultimodal deep learning (3D CNN + FCN)Multiclass classificationStructural MRI, demographics, and neuropsychologyDiagnosis (NC, MCI, AD, and nADDbd) and saliency mapsExternal: trained NACC, and validated ADNI and independent cohortsPerformance comparable to neurologists and saliency aligned with pathologyRetrospective design and heterogeneity in protocols across cohorts
Oh et al [91], 2023ADNILEARbe framework (CNN + RLbf + XAI)Diagnosis and interpretationStructural T1 MRIAD vs non-AD classification and counterfactual mapsInternal: cross-validation on ADNIImproved accuracy and generalization, and localized plausible atrophy patternsSingle-cohort (ADNI) and XAI evaluation, partly qualitative
Lian et al [92], 2020ADNI-1 and ADNI-2Hierarchical FCNDiagnosis and atrophy localizationStructural T1 MRIAD vs NC, MCI vs NC, and atrophy pattern mappingExternal: trained ADNI-1 and tested ADNI-2Improved accuracy vs conventional features and interpretable atrophy mapsADNI-only; strong reliance on preprocessing and registration
Avsec et al [93], 2021Genomic reference datasetsEnformer (transformer)Genomic predictionDNA sequenceGene expression and chromatin state predictionInternal: held-out chromosomesImproved capture of long-range regulatory effects vs previous modelsLimited to available cell types and assays
Yang et al [94], 2021ADNIDeep learning and super learnerPrognosisMRI, cognitive, and biomarkersDiagnostic classification and prognostic risk signatureInternal: cross-validation within ADNIDerived signature distinguished diagnostic groups and progression riskLimited external validation and restricted to the ADNI research cohort
Lee et al [95], 2024ADNI and UK or Singapore ClinicsPPMbgPrognosis (MCI to AD)MRI (gray matter) and cognitive testsIndividualized prognostic indexExternal: independent real-world memory clinicsAccuracy ≈81.7%, AUC ≈0.84; and index predicted conversion better than atrophy aloneHeterogeneity in real-world clinical data and potential site effects
Zhu et al [96], 2021ADNI and AIBLDA-MIDLbhDiagnosisStructural MRI patchesAD vs NC and MCI vs NCExternal: trained ADNI and tested AIBLHigher accuracy and generalizability than baselines, and attention maps aligned with pathologyReliance on structural MRI and potential dataset-specific overfitting
Zhang et al [97], 2024ADNIGCNbi, SHAPbj, and automatic fusionDiagnosisCognitive, MRI, PET, and risk factorsAD vs non-AD diagnosis, and multimodal feature selectionInternal: two ADNI multimodal cohortsAccuracies of 95.9% and 91.9%, and efficient selection of clinically important featuresComplex model deployment and reliance on ADNI data
Velazquez and Lee [75], 2022ADNI EMCIbkEnsemble (random forest and CNN)Prediction of conversionDTIbl (ADCbm maps) and EHRbnEMCI to AD conversion predictionInternal: held-out test set98.81% accuracy and feature importance explainability providedSmall converter sample size and potential overfitting
Zhang et al [76], 2024ADNIMultimodal learning machine (ELMbo ensemble)DiagnosisMRI features and neuropsychological testsNC, MCI, and AD classificationInternal: cross-validation on ADNI>98% accuracy and F1-score, and no observed bias between MCI and ADA single research cohort and very high accuracy require external verification
Bi et al [98], 2020ADNICluster evolutionary random forestDiagnosisResting-state fMRIbp and SNPAD vs control classification and biomarker identificationComparative vs competing methodsIdentified significant brain region-gene pairs and effective classificationSmall multimodal sample size and complex hyperparameters
Bi et al [99], 2022ADNIWeighted evolutionary random forestPathogen detectionResting-state fMRI and SNPMCI identification and pathogenic factor extractionComparative vs state-of-the-art methodsSuperior MCI identification performance, and highlighted key ROIs and SNPsHigh-dimensional fusion features, small N, and overfitting risk
Hashmi and Barukab [100], 2023OASISDeep RL and neural networkStagingStructural MRI4-class dementia stagingInternal: augmented vs baselineRL augmentation improved accuracy by ≈6% and recall by ≈13%Single open dataset; focus on MRI only
Wang et al [101], 2024ADNI-1/2/3Multimodal DLbq with an interaction layerPrognosis (MCI to AD)MRI, clinical, and genetics (SNP)4-year conversion predictionExternal: generalized to ADNI-3AUC 0.962 (cross-validation), 0.939 (test); interaction effects improved accuracyADNI-only and potential overfitting despite cross-validation
Hatami et al [102], 2024ADNIDNNbr and RL (data augmentation)ClassificationStructural MRIAD vs NC classificationComparative vs baseline augmentation approachesPrecision ≈0.95; RL-guided augmentation outperformed baselinesSingle research cohort and no external clinical validation
Tabarestani et al [103], 2020ADNIDistributed multitask regressionLongitudinal progressionMRI, PET, CSFbs, EEGbt, and clinicalPrediction of longitudinal cognitive scoresComparative vs unimodal or multimodal methodsReduced errors, particularly in sparse or incomplete longitudinal dataModel complexity and potential sensitivity to hyperparameters
Burkhart et al [104], 2024ADNI and Singapore Memory ClinicUnsupervised multimodal trajectory modelingPrognosisCognitive, amyloid PET, and MRICognitive health clustering and progression predictionExternal: real-world memory clinic dataBetter stratification than standard clinical assessments and robust to missing dataUnsupervised complexity and reliance on ADNI for training
El-Sappagh et al [105], 2021ADNIRandom forest and SHAP (multilayer)Diagnosis and progression11 modalities (MRI, PET, CSF, and clinical)Multiclass diagnosis and MCI progression detectionInternal: cross-validationDiagnosis accuracy 93.95%, progression accuracy 87.08%, and interpretableHigh complexity and challenges for routine care deployment
Lee et al [106], 2024ADNI and 4 Korean hospitalsGBMbuConversion predictionMRI (T1, T2-FLAIRbv), amyloid PET, and clinicalMCI to AD conversion (4-year)Internal: nested cross-validation with modality combinationsT1 and amyloid PET is the best combination, and T2-FLAIR did not improve predictionSmall multicenter sample and site and scanner heterogeneity
Yuan et al [107], 2021ADNIMultimodal cotraining (random forest)MCI subtype classificationStructural MRI and SNPsMCI vs pMCI classificationExternal: ADNI-2 independent test setAccuracy 85.5% and cotraining outperformed single modalityDependence on feature selection and ADNI-only
Cirincione et al [108], 2024TADPOLEbw (ADNI)Ensemble integrationPredictionMRI, PET, clinical, and cognitiveFuture dementia prediction in MCIInternal: held-out test setAUC 0.81, and outperformed XGBoost and deep learning baselinesSingle research dataset and the complexity of multimodal integration
Cassani and Falk [109], 2020Clinical EEGFeature engineering and MLDiagnosis and severityResting-state EEGAD vs normal, and mild vs moderate AD classificationInternal: cross-validationModulation spectral features outperformed traditional EEG featuresSmall sample size, resting state only, and single center
Cilia et al [110], 2021Custom (Naples)Deep transfer learning (CNN)DiagnosisOnline handwriting (dynamic)Early AD detectionInternal: cross-validationDynamic features (color-encoded) are superior to shape-only imagesSingle-center dataset and task-specific protocol
Kmetzsch et al [111], 2022PREV-DEMALSbxSupervised variational autoencoderDisease progression modelingMRI and microRNADisease progression score (FTDby/ALSbz)Validation: synthetic data and cohort evaluationOutperformed competing models in capturing progression trajectorySmall sample (rare disease) and cross-sectional data used for progression
Mengoudi et al [112], 2020UCLca and Insight 46Self-supervised deep neural networkDiagnosisEye-tracking (gaze or pupil)Dementia vs control classificationComparative vs handcrafted featuresSelf-supervised features are more sensitive than handcrafted metricsModest sample size, mixed dementia subtypes, and specialized hardware
Tsai et al [113], 2024Taiwan NHIcbMANDccIncidence predictionEHR (ICD codes) and demographicsDementia incidence riskInternal: held-out test setAUC 0.901 and outperformed traditional CTRcd modelsCoding errors in administrative data are specific to the Taiwan NHI
Park et al [22], 2024Korean memory clinicsSVMceDiagnosis (MCI vs HCcf)VRcg biomarkers, MRI, and neuropsychological testsMCI vs healthy control classificationInternal: train or test splitVR, MRI AUC 0.89, and VR biomarkers comparable to MRI aloneSmall sample (n=54) and VR hardware requirement
Wu et al [114], 2022Clinical EEGWiGMMchSeverity detectionResting-state EEGUnsupervised dementia degree detectionInternal: latent structure analysisCaptured latent dementia degrees matching clinical statusUnsupervised labeling requires careful interpretation
Zhang et al [115], 2025Chinese memory clinicsFCRNci and MLPcj (patch-based)DiagnosisMRI, PET, clinical, and genotypeAD vs normal and MCI vs normal classificationInternal: cross-validationAccuracy ≈96% (AD), ≈92% (MCI), and interpretable probability mapsSingle-country clinical cohorts and limited ethnic diversity
Fabietti et al [77], 2023Mouse modelsEnsemble machine learningEarly detection (animal)LFPckAD vs control mouse classificationInternal: channel masking robustness testsAccuracy 99.4% and robust to artifactsPreclinical animal model results and small sample size
Seifallahi et al [78], 2022Single centerSVMDiagnosisKinect V2 (gait or TUGcl)AD vs healthy control classificationInternal: leave-one-out cross-validationAccuracy 98.68% using 12 skeletal featuresSmall sample, case-control design may overestimate performance
Fan et al [116], 2024CVDcm patients (Wuhan)ViTcn (MRI) and XGBoost (clinical)VCIco diagnosisMRI (T1, T2-FLAIR) and clinicalVascular cognitive impairment diagnosisExternal: independent CVD datasetThe hybrid model has an AUC of 0.965 and is comparable to expert neurologistsCVD-specific cohort and complex ViT and XGBoost pipeline
Beebe-Wang et al [117], 2021Aging cohort (US)Nonlinear ML and SHAPImminent prediction (3 years)Clinical, neuropsychologicalIncident dementia within 3 yearsInternal: cross-validationSparse model (4 tests) comparable to full batteryPrediction limited to a 3-year horizon and a single health system
Battineni et al [118], 2021Public MRI datasetGradient boostingClassificationMRI features and demographicsAD vs non-AD classificationInternal: cross-validationAccuracy 97.58% (gradient boosting performed best)Small public dataset and lack of external validation

aNACC: National Alzheimer’s Coordinating Centre.

bADNI: Alzheimer’s Disease Neuroimaging Initiative.

cAIBL: Australian Imaging, Biomarkers, and Lifestyle Study.

dFHS: Framingham Heart Study.

ePPMI: Parkinson Progression Markers Initiative.

fOASIS: Open Access Series of Imaging Studies.

gMRI: magnetic resonance imaging.

hFLAIR: fluid-attenuated inversion recovery.

iPET: positron emission tomography.

jAD: Alzheimer disease.

kMCI: mild cognitive impairment.

lNC: normal control.

mAUROC: area under the receiver operating characteristic curve.

nMM-SPDN: multimodal stacked deep polynomial network.

oFDG-PET: fluorodeoxyglucose-positron emission tomography.

pDPN: deep polynomial network.

qSPDN: stacked deep polynomial network.

rROI: region of interest.

sXGBoost: Extreme Gradient Boosting.

tAPOE-ε4: apolipoprotein E epsilon 4 allele.

uICD-10: International Statistical Classification of Diseases, Tenth Revision.

vLightGBM: Light Gradient-Boosting Machine.

wASCVD: atherosclerotic cardiovascular disease.

xVD: vascular dementia.

yAUC: area under the curve.

zCAIDE: Cardiovascular Risk Factors, Aging, and Incidence of Dementia.

aaDRS: Dementia Risk Score.

abICD: International Classification of Diseases.

acOR: odds ratio.

adHT: hormone therapy.

aeIXI: Information Extraction From Images.

af3D-ViT: 3D vision transformer.

agBAG: brain age gap.

ahGWAS: genome-wide association study.

aiSNP: single-nucleotide polymorphism.

ajxQTL: molecular quantitative trait locus.

akMAE: mean absolute error.

alMAPT: microtubule-associated protein tau.

amTNFSF12: Tumor Necrosis Factor (Ligand) Superfamily, Member 12.

anVGG-16: Visual Geometry Group 16-Layer Network.

aoLAVA: Granular Neuron-Level Explainer.

apXAI: explainable artificial intelligence.

aqIDP: imaging-derived phenotype.

arFCN: fully convolutional network.

aspMCI: progressive mild cognitive impairment.

atsMCI: stable mild cognitive impairment.

auVBM: voxel-based morphometry.

avMWAN: multi-task weakly-supervised attention.

awMMSE: Mini-Mental State Examination.

axCDRSB: Clinical Dementia Rating–Sum of Boxes.

ayADAS-Cog: Alzheimer Disease Assessment Scale–Cognitive Subscale.

azRMSE: root mean square error.

baADNI-GO/2: Alzheimer’s Disease Neuroimaging Initiative – Grand Opportunity / Phase 2.

bbCNN: convolutional neural network.

bcADPC: Alzheimer Disease Prediction Challenge.

bdnADD: non-Alzheimer disease dementia.

beLEAR: learn-explain-reinforce.

bfRL: reinforcement learning.

bgPPM: Predictive Prognostic Model.

bhDA-MIDL: dual attention multi-instance deep learning.

biGCN: graph convolutional network.

bjSHAP: Shapley Additive Explanations.

bkEMCI: early mild cognitive impairment.

blDTI: diffusion tensor imaging.

bmADC: apparent diffusion coefficient.

bnEHR: electronic health record.

boELM: extreme learning machine.

bpfMRI: functional magnetic resonance imaging.

bqDL: deep learning.

brDNN: deep neural network.

bsCSF: cerebrospinal fluid.

btEEG: electroencephalography.

buGBM: Gradient Boosting Machine.

bvT2-FLAIR: T2-weighted fluid-attenuated inversion recovery.

bwTADPOLE: The Alzheimer Disease Prediction of Longitudinal Evolution.

bxPREV-DEMALS: Predict to Prevent Frontotemporal Lobar Degeneration and Amyotrophic Lateral Sclerosis.

byFTD: frontotemporal dementia.

bzALS: amyotrophic lateral sclerosis.

caUCL: University College London.

cbNHI: National Health Insurance.

ccMAND: Multimodal Attention Network.

cdCTR: clinical trial registration.

ceSVM: support vector machine.

cfHC: healthy control.

cgVR: virtual reality.

chWiGMM: Warped Infinite Gaussian Mixture.

ciFCRN: fully convolutional residual network.

cjMLP: multilayer perceptron.

ckLFP: local field potentials.

clTUG: Timed Up and Go.

cmCVD: cardiovascular disease.

cnViT: vision transformer.

coVCI: vascular cognitive impairment.

UK Biobank Dataset

UK Biobank enables population-level association studies and early-risk modeling. It has been widely used in AD diagnosis research, including the following notable studies:

Recent UK Biobank–based studies have applied diverse multimodal ML and deep learning approaches for AD risk prediction and diagnosis, integrating neuroimaging, genetic, clinical, and lifestyle data. These models generally achieved moderate to high performance (area under the curve [AUC] ≈0.77‐0.90) and demonstrated improved diagnostic utility compared with conventional assessment methods [79,81-83]. Several studies further emphasized the importance of genetic and hormonal factors in risk stratification [84,85]. In addition, explainable and semisupervised frameworks have enhanced model interpretability and scalability for population-level analysis, facilitating clinically relevant phenotyping and disease monitoring [86,87].

This section describes multimodal model implementation in the UK Biobank. As shown in the analysis and Table 2, UK Biobank data support AD diagnosis and risk prediction, but limitations remain: class imbalance, which may bias training, and the need for external validation to confirm generalizability beyond the UK Biobank cohort.

ADNI Dataset

ADNI provides a rich and diverse collection of demographic information, multimodal data, and clinical assessments. Owing to its comprehensive scope and longitudinal design, it has become one of the most widely adopted benchmark datasets for computer-aided diagnosis of AD. The following studies exemplify its use:

Recent ADNI-based studies have developed a wide range of multimodal and deep learning frameworks integrating neuroimaging, genetic, cognitive, and clinical data for AD diagnosis and MCI-to-AD progression prediction. Attention-based, multitask, ensemble, and time-to-event models have enabled accurate localization of disease-related regions, improved prognostic modeling, and enhanced interpretability through explainable artificial intelligence techniques such as SHAP (Shapley Additive Explanations) and counterfactual analysis [15,75,76,88-92,97]. Several approaches further incorporated RL, semisupervised learning, and data augmentation to improve robustness and generalizability in heterogeneous and imbalanced datasets [98-103,107]. These models typically achieved high diagnostic and prognostic performance (AUC up to ≈0.96), with some demonstrating strong external validation and clinical relevance [93-96,101,104-106,108]. Nevertheless, existing reviews and benchmarking studies have highlighted persistent limitations, including dataset bias, inconsistent evaluation protocols, and limited cross-center validation, underscoring the need for standardized and reproducible multimodal frameworks [119].

While ADNI provides a comprehensive and standardized multimodal resource for AD research and supports robust model performance, several limitations remain. These include class imbalance, underrepresentation of racially diverse populations, and limited external validation, which may bias model training and restrict generalizability across clinical settings.

Self-Collected Datasets

While public datasets such as ADNI provide standardized benchmarks, self-collected datasets enable more flexible acquisition of targeted modalities. Representative studies include the following.

Studies based on self-collected datasets have explored diverse multimodal fusion strategies. EEG- and local field potentials–based models, as well as hybrid MRI–PET–biomarker frameworks, demonstrated high diagnostic and staging accuracy and supported interpretable risk mapping [77,109,111,114,115]. In parallel, behavioral and digital biomarkers derived from handwriting, eye tracking, virtual reality, and motion capture have enabled noninvasive and low-cost screening with strong classification performance [22,78,110,112]. Large-scale real-world health records and hybrid deep learning models further facilitated population-level risk prediction and vascular cognitive impairment assessment, achieving robust AUC values above 0.90 [113,120]. Overall, self-collected datasets have expanded the scope of multimodal AD research by enabling flexible modality integration and novel biomarker discovery, while remaining constrained by limited sample sizes and heterogeneous acquisition protocols.

Self-collected datasets offer distinct advantages, including targeted modality acquisition, novel biomarker discovery, such as microRNA, local field potentials, and handwriting, and enhanced real-world clinical utility. However, self-collected datasets typically endure limited sample sizes, which increases susceptibility to overfitting and compromises generalizability across diverse populations.

Multimodal Linguistic-Based Cognitive Impairment Datasets

Beyond multimodal clinical phenotyping datasets, multimodal linguistic-based cognitive impairment datasets represent an equally vital research resource. These datasets offer a noninvasive and cost-effective methodology for detecting cognitive decline, particularly valuable for identifying early-stage or subtle impairments where traditional neuroimaging or biomarker data may yield inconclusive results. Capturing spontaneous or semistructured speech and language patterns pushes the development of AI in speech data. Recent work is shown in Table 3.

Table 3. Multimodal linguistic-based cognitive impairment datasets related papers.
StudyDatasetsModel typeType of taskModalitiesOutcomesValidationResultsLimitation
Ilias et al [121], 2023ADReSSa and ADReSSobMultimodal transformer (BERTc and DeiTd) with optimal transportDementia detection (ADe vs non-AD)Audio (spectrograms) and text (transcripts)Classification metrics and calibrationInternal: ADReSS or ADReSSoAccuracy ≈91.25%, F1-score ≈91.06%; improved calibration vs baselinesSmall, curated datasets, English-only, and potential overfitting
Poor et al [122], 2024I-CONECTfMultimodal cross-transformer with coattentionMCIg prediction (MCI vs NCh)Audio, text, and vision (facial video)AUCi scoresInternal: cross-validationTrimodal AUC 85.3%, and outperformed unimodal (60.9%) and bimodal (76.3%) modelsSingle cohort (I-CONECT), cross-sectional, and complex architecture
Lin and Washington [123], 2024DementiaBank (Pitt)Wav2vec (audio) and Word2Vec (text)Dementia classificationAudio, text, and timestampsAccuracy and AUROCjInternal: cross-validationText augmentation improved accuracy to ≈80% (AUROC 90%), and timestamps added minimal valueSingle corpus: timestamps lacked resolution, and a modest sample size
Ortiz-Perez et al [124], 2023DementiaBank (Pitt)Multimodal ensemble (CNNk and transformer)Prediction of dementia signsAudio and textClassification accuracyInternal: held-out test setsText-only transformer best (accuracy 90.36%) and audio contributed less than textSingle English dataset, broad diagnosis category, and task constrained to picture description
Ilias and Askounis [125], 2022ADReSS (DementiaBank)Transformer (BERT) and Siamese NetworkAD identification and severity estimationText (transcripts)Accuracy and interpretability (LIMEl)Internal: cross-validationSingle-task accuracy 87.50%, multitask accuracy 86.25%, and distinct linguistic patterns identifiedSmall dataset, text only, MMSEm treated as categorical, and no acoustic information
Wen et al [126], 2023DementiaBank (Pitt)Transformer and causal counterfactual XAInAD detectionText (part-of-speech tag features)Accuracy; F1-score; feature importanceInternal: cross-validationAccuracy 92.2%, F1-score 0.955, identified 12 key part-of-speech features linked to ADText only (part-of-speech), reliance on tagging accuracy, and no acoustic or imaging data
Chen et al [127], 2023DementiaBank (Pitt)SpeechFormer + + (hierarchical transformer)Paralinguistic AD detectionAudio (acoustic features)Accuracy; F1-scoreInternal: held-out test setsOutperformed standard transformers and CNN/RNNo baselines and SOTAp performanceSingle corpus, complex computation, audio only, and no cross-lingual evaluation
Zheng et al [128], 2022DementiaBank (Pitt)N-gram, AWD-LSTMq, or neural modelsDementia detectionText (context words, stop words, and part-of-speech)Classification accuracyInternal: held-out test dataCombined model (vocabulary and grammar) accuracy 81.54%, and grammar contributes comparably to contextSpecific to task or language, and moderate performance vs multimodal approaches
Nambiar et al [129], 2022DementiaBank (Pitt)Deep Classifiers (BERT/ALBERTr + BiLSTMs)Early dementia detectionText (transcripts)Accuracy; F1-scoreInternal: train and test splitsBERT + BiLSTM accuracy 0.812; ALBERT + BiLSTM F1-score 0.81; contextual embeddings superiorText only; reliance on manual transcripts; single dataset
Priyadarshinee et al [130], 2023ADReSSo-2021MLt classifiers (SVMu, RFv, and NNw)AD detectionAudio and text (transcripts)Classification accuracyInternal: held-out test setText features (accuracy 88.7%) outperformed audio, and file-level features were superior to frame-levelBenchmarking context, single task, and single language
Liu et al [131], 2023ADReSS, ADReSSo, and the local Chinese datasetEnsemble ML (VADx pause and acoustic)AD detectionAudio (acoustic and VAD pause features)AccuracyInternal: cross-validation; cross-lingual (Chinese)Ensemble improved accuracy by ≈8% on ADReSS, and accuracy 80% on the local Chinese datasetSmall local dataset (n=10), handcrafted features, and ensemble complexity
Shah et al [23], 2023ADReSS-MLogistic regression and SVRCross-lingual AD detection; MMSE regressionAudio (duration, pause, and intelligibility) and metadataAccuracy and RMSEyExternal: Greek test setEnglish cross-validation accuracy 74.7%, Greek Test accuracy 69.57%, and MMSE RMSE 4.77 (Greek)Small Greek sample, modest accuracy, and simple ML models vs deep learning
Mahajan and Baths [132], 2021ADReSSBimodal framework (CNN-LSTMz and Speech-GRUaa)AD detectionAudio and textClassification accuracyInternal: cross-validationBimodal enriched model improved performance by ≈6.25% over acoustic baselinesSmall dataset, potential overfitting, and single task (picture description)
Mei et al [133], 2023ADReSS-MBilingual wav2vec 2.0 + XGBoostabCross-lingual AD detection and MMSE predictionAudio (acoustic, silence, and low-frequency bands)Accuracy and RMSEExternal: Greek test setAccuracy 73.9% (Greek), MMSE RMSE 4.610, and low-frequency speech aided transferVery small Greek sample, speech-only, and challenge context
Meerza et al [134], 2022ADReSSFLac (LSTMak and feed-forward)Privacy-preserving AD diagnosisAudio (Mel-frequency and pause features)Accuracy and fairness metricsInternal: simulated FL clientsFL accuracy close to the centralized baseline, and q-FedAvg improved fairnessSimulated clients, single dataset, and relies on feature extraction
Chen et al [135], 2023ADReSS-MSVM or NN on pretrained featuresCross-lingual AD detectionAudio (paralinguistic and XLSR-53ae), and text (ASRafAccuracy and RMSEExternal: Greek test setAccuracy 69.6% (Greek), RMSE 4.788, and paralinguistic features transferablePerformance below monolingual systems and reliance on ASR quality
Ilias et al [121], 2023ADReSSMultimodal transformer (ViTag, BERT, and GMUah)AD detectionAudio (spectrograms) and textAccuracy and F1-scoreInternal: cross-validationHigh eighties or low nineties accuracy, ViT is best for acoustic, and fusion surpassed SOTASmall dataset, binary classification focus, and external generalization untested
Tamm et al [136], 2023ADReSS-MaiSequence models (transfer learning)Cross-lingual AD detection and MMSEAudio features and demographicsAccuracy and RMSEExternal: Greek test setAccuracy 82.6% (Greek), RMSE 4.345, and ranked second in the challengeSmall Greek sample, acoustic only, and transfer limited to English-Greek
Woszczyk et al [137], 2022ADReSSTransformers vs traditional MLAD detectionAudio and textClassification accuracyInternal: held-out test dataData augmentation improved performance and was comparable to SOTAAugmentations tuned for ADReSS and a single speech task
Jin et al [138], 2023ADReSS-MCONSENaj ensemble (acoustic and disfluency)Multilingual AD detection and MMSEAudio (acoustic embeddings and disfluency)Accuracy and RMSEExternal: Greek test setFirst place in the challenge, accuracy 86.69% (Greek), and RMSE 3.727Challenge dataset, ensemble complexity, and reliance on diarization quality

aADReSS: Alzheimer Dementia Recognition Through Spontaneous Speech.

bADReSSo: Alzheimer’s Dementia Recognition Through Spontaneous Speech only.

cBERT: Bidirectional Encoder Representations From Transformers.

dDeiT: Data-Efficient Image Transformers.

eAD: Alzheimer disease.

fI-CONECT: Identifying Cognition in the Elderly Through Conversational Engagement.

gMCI: mild cognitive impairment.

hNC: normal control.

iAUC: area under the curve.

jAUROC: area under the receiver operating characteristic curve.

kCNN: convolutional neural network.

lLIME: Local Interpretable Model-Agnostic Explanations.

mMMSE: Mini-Mental State Examination.

nXAI: explainable artificial intelligence.

oRNN: recurrent neural network.

pSOTA: state of the art.

qAWD-LSTM: Average stochastic gradient descent weight-dropped long short-term memory

rALBERT: A Lite Bidirectional Encoder Representations From Transformers.

sBiLSTM: bidirectional long short-term memory.

tML: machine learning.

uSVM: support vector machine.

vRF: random forest.

wNN: neural network.

xVAD: voice activity detection.

yRMSE: root mean square error.

zCNN-LSTM: convolutional neural network long short-term memory.

aaSpeech-GRU: Speech Gated Recurrent Unit.

abXGBoost: Extreme Gradient Boosting.

acFL: federated learning.

adLSTM: long short-term memory.

aeXLSR-53: cross-lingual speech representation-version 53

afASR: automatic speech recognition.

agViT: vision transformer.

ahGMU: gated multimodal unit.

aiADReSS-M: Alzheimer Dementia Recognition through Spontaneous Speech – Multimodal.

ajCONSEN: complementary and simultaneous ensemble.

Recent studies have shown that multimodal fusion of speech and text using transformer-based architectures substantially improves AD detection performance, with F1-scores above 0.90 on ADReSS and ADReSSo (Alzheimer’s Dementia Recognition Through Spontaneous Speech 2021 Challenge) datasets [121,132,139]. Linguistic feature engineering and interpretable language models further enhanced classification accuracy, achieving up to 92.2% accuracy and F1-scores of 0.955 using compact part-of-speech features [124-126,128,130]. Cross-lingual approaches based on language-agnostic and transfer learning methods enabled moderate generalization, with accuracies ranging from 69% to 73.9% in English-Greek transfer settings [23,127,133,136]. To support real-world deployment, lightweight and hierarchical models achieved around 80% accuracy with reduced computational cost [131,135]. In addition, data augmentation and ensemble strategies improved robustness in low-resource scenarios, yielding F1-score gains of 5%‐7% and competitive challenge performance (accuracy 86.69%) [123,137,138].

Summarization Based on All Multimodal Datasets and Quantitative Analysis

Table 2 and Table 3 summarize the recent state-of-the-art models across the 2 major types of multimodal datasets, extracted according to the Cochrane Handbook. Full QUADAS-2 forms are available in Multimedia Appendix 5. Based on these results, the following quantitative synthesis compares performance trends across all multimodal datasets. Across the 4 major dataset categories, modality choices and model performance show clear dataset-dependent patterns as shown in Table 4. UK Biobank studies mainly combine MRI, clinical variables, and genetic features, with 2 diagnosis studies reporting an average accuracy of 71.4% (SD 5.2%) and 4 risk-prediction studies reaching an average AUC of 0.84 (SD 0.056). ADNI studies use the most comprehensive modality integrations, with 3 diagnosis studies averaging 92.5% (SD 3.8%) accuracy, 3 MCI-conversion studies achieving a mean AUC of 0.922 (SD 0.045), and risk-prediction studies reaching an average AUC of 0.81 (SD 0.06); these tasks collectively achieve the strongest results, with fusion models frequently reporting AUC values above 0.95. DementiaBank studies differ fundamentally by focusing on speech- and language-based modalities; 9 diagnosis studies report an average AUC of 0.813 (SD 0.042), and 5 cross-lingual AD-detection studies show a mean accuracy of 77% (SD 6.5%), where transformer architectures consistently outperform classical approaches, with models such as BERT + DeiT (Data-Efficient Image Transformers), BERT + ViT (vision transformer), and RoBERTa + (Robustly Optimized Bidirectional Encoder Representations From Transformers Approach) DNN (deep neural network) showing F1-scores exceeding 0.90. Self-collected datasets are typically smaller and more heterogeneous; 3 diagnosis studies report an average accuracy of 96% (SD 2.4%), and lightweight models such as EEGNet or ViT-based hybrids demonstrate strong predictive capacity when applied to EEG or structural MRI.

Table 4. Summary of representative modality combinations and top-performing models in multimodal AIa-aided ADb diagnosis.
Dataset and taskCountsAverage performanceBest performance modalitiesRelated article
UK Biobank
Diagnosis2Accuracy=71.4%Retinal fundus images[79,81-87,140]
Risk prediction4AUCc=84%Clinical, biological assays, cognitive tests, and physical measures[79,81-87,140]
Other3N/AdMultimodal MRIe (T1, T2, MRI, etc)[79,81-87,140]
ADNIf
Diagnosis3Accuracy=92.5%Structural MRI features and neuropsychological tests[15,75,76,79,80,89-92,94-108,119,141]
MCIg conversion3AUC=92.2%Structural MRI, clinical variables, and genetics (SNPh)[15,75,76,79,80,89-92,94-108,119,141]
MMSEi regression2No integrationWhole-brain T1-weighted MRI and clinical scores[15,75,76,79,80,89-92,94-108,119,141]
Risk prediction7AUC=81%MRI, PETj, clinical, and cognitive[15,75,76,79,80,89-92,94-108,119,141]
Other13N/AN/A[15,75,76,79,80,89-92,94-108,119,141]
Dementia bank
Diagnosis9AUC=81.3%Text transcripts → part-of-speech feature vectorsTable 3
Cross-lingual AD detection5Accuracy=77%Multimodal acoustic fusionTable 3
Other6N/AN/ATable 3
Self-collected datasets
Diagnosis3Accuracy=96%MRI, PET, clinical, and genotype[22,77,78,106,109-115,117,120,142]
Other6No integrationDifferent task[22,77,78,106,109-115,117,120,142]

aAI: artificial intelligence.

bAD: Alzheimer disease.

cAUC: area under the curve.

dN/A: not available.

eMRI: magnetic resonance imaging.

fADNI: Alzheimer Disease Neuroimaging Initiative.

gMCI: mild cognitive impairment.

hSNP: single-nucleotide polymorphism.

iMMSE: Mini-Mental State Examination.

jPET: positron emission tomography.

To interpret these results and limit metric inflation, note that purely internal cross-validation tends to overestimate performance: AUC is typically ≈5‐15 points higher than with external validation. Small or tightly controlled datasets also report accuracies ≈10%‐20% above those in large, heterogeneous cohorts. Severe class imbalance can further raise accuracy while lowering F1-score or sensitivity; without correction, imbalance may inflate results by ≈5%‐12%. Cross-sectional models often score higher in single-timepoint evaluations, whereas longitudinal designs usually yield lower but more stable estimates, which are more informative for follow-up and clinical use.

These findings should be interpreted in light of substantial heterogeneity and risk of bias. Variation in sample composition, task definitions, and evaluation procedures across datasets limits direct comparison of performance metrics. QUADAS-2 also indicated frequent unclear and high-risk in-patient selection, reference standards, and flow or timing, especially in studies using only internal validation or selected samples. Reported metrics, therefore, likely represent upper-bound estimates rather than expected real-world performance, and apparent gains often reflect dataset-specific effects rather than generalizable model superiority.

Overall, the evidence shows that modality effectiveness varies substantially across datasets, transformer models deliver the highest gains in speech-language tasks, and large clinical phenotyping datasets such as UK Biobank and ADNI still rely mainly on traditional machine-learning or custom fusion frameworks rather than modern cross-modal transformers. This gap highlights an opportunity to develop transformer-based multimodal integration approaches tailored to large, heterogeneous clinical datasets.

Multimodal Fusion Taxonomy

A structured multimodal fusion taxonomy clarifies the performance of different integration strategies across datasets (Tables 2 and 3). A total of 4 main paradigms are commonly used: early, intermediate, late, and attention- or graph-based fusion.

Early fusion concatenates low-level features and performs well for aligned modalities such as MRI + PET, often achieving AUC>0.95 in ADNI studies, but is sensitive to missing data and feature-scale heterogeneity. Intermediate fusion combines latent representations from modality-specific encoders and is effective for heterogeneous inputs such as MRI + speech or EEG + clinical data, as demonstrated by high performance in ADReSS-based models, although it may be unstable in small datasets. Late fusion aggregates model outputs and is robust to missing modalities, performing well in large datasets such as the UK Biobank, but underuses fine-grained cross-modal interactions.

Across paradigms, limited modality availability and high acquisition costs remain key challenges, underscoring the need for adaptive and clinically feasible fusion strategies.


Principal Findings

This review synthesized multimodal AI studies for AD across diverse dataset families, including clinical phenotyping and cognitive-linguistic datasets. Multimodal fusion generally outperformed unimodal baselines, but the gain is dataset-dependent and should be interpreted cautiously. Strong performance in curated cohorts and constrained speech benchmarks may not generalize to population-based or multicenter settings. QUADAS-2 also indicated frequent risk of bias and unclear reporting across domains, likely inflating metrics and limiting comparability. Accordingly, headline accuracy and AUC should be treated as upper-bound estimates unless supported by external validation and transparent reporting.

Challenges and Future Directions

In recent years, multimodal models have demonstrated remarkable potential in computer-aided diagnosis and risk prediction for AD. While these methods have achieved significant successes, several challenges remain that warrant careful examination. In this section, this systematic review summarizes the common limitations identified in existing studies and proposes directions for future research to advance the field.

Clinical and Translational Implications

Multimodal AI could support AD diagnosis through several clinical pathways. In memory clinics, models combining MRI, cognitive scores, and blood biomarkers could triage referrals, prioritizing patients for specialist review or PET. In general practice, speech-based and routine clinical-feature models could be embedded in consultations to flag early cognitive change. In radiology, MRI-clinical fusion could act as a second reader, reducing interobserver variability and supporting less experienced clinicians. Where imaging or specialist access is limited, speech, digital questionnaires, and basic clinical data could enable telemedicine-based screening and follow-up. At the population level, these models could support risk stratification and targeted monitoring. To enable real-world deployment, research should prioritize external multicenter validation, integration with electronic health records, and evaluation of regulatory feasibility, cost-effectiveness, and clinical impact.

Ethical and Regulatory Implications

Deploying multimodal AI for AD diagnosis requires ethical and regulatory safeguards. As datasets often combine imaging, clinical records, genomics, and speech, they fall under strict privacy regimes (eg, General Data Protection Regulation in the European Union; HIPAA [Health Insurance Portability and Accountability Act] in the United States), requiring explicit consent, data minimization, and secure handling, with added complexity for sensitive modalities such as speech and genomic data. Clinical deployment is also shaped by medical-AI governance frameworks (eg, the European Union AI Act, Food and Drug Administration Software as a Medical Device guidance, and UK Medicines and Healthcare products Regulatory Agency Good Machine Learning Practice), which emphasize transparency, risk management, and postdeployment monitoring. Fairness is essential because demographic imbalance can yield uneven performance across age, ethnicity, and language groups. Interpretability (eg, imaging attention maps and linguistic saliency) supports clinical accountability and aligns with explainability expectations. Future work should incorporate privacy-preserving methods, bias audits, and regulatory-aligned validation pipelines to enable responsible clinical integration.

Data Privacy and Data-Sharing Constraints

Access to multimodal AD data remains severely restricted by privacy regulations and ethical constraints, which limit data sharing and external validation. This restricts the sharing and usage of comprehensive datasets needed for robust external validation and generalizability.

Federated learning (FL) provides a technically viable privacy-preserving solution; however, differences in data formats and institutional infrastructures still impede its large-scale deployment. For instance, Meerza et al [134] pioneered FL for AD speech diagnosis using mel-frequency cepstral coefficients and pause features, maintaining model performance while ensuring privacy through q-FedAvg/q-FedSGD optimization. Nambiar [129] validated an ALBERT (A Lite Bidirectional Encoder Representations From Transformers) + BiLSTM (bidirectional long short-term memory) hybrid model on the ADReSS dataset, achieving strong performance without compromising data confidentiality. In parallel, multi-institutional collaborations leveraging publicly available datasets such as ADNI, UK Biobank, and OASIS have enabled richer external validation while adhering to rigorous privacy standards [15,79,88,95,100,139,140].

Despite encouraging results, FL still lacks harmonized protocols and interoperable platforms. This limits cross-center reproducibility and weakens clinical credibility. International collaboration also remains constrained by regulatory differences. Future work should prioritize unified federated frameworks with standardized protocols and privacy-preserving methods to enable secure global data collaboration [143,144].

As most datasets lack fully matched modalities per participant, multimodal fusion often relies on representation- or population-level integration rather than early fusion. Early fusion requires paired samples and is therefore infeasible across datasets. By contrast, late fusion and embedding-level integration can train unimodal models separately and combine them via meta-learners, cross-modal transformers, or probabilistic ensembles. Domain adaptation, transfer learning, and harmonization can also combine heterogeneous cohorts at the population level to improve generalizability. A standardized benchmark could further support this by defining shared preprocessing, label taxonomies, and evaluation metrics, enabling meaningful comparison or representation-stage fusion even without subject-level pairing.

Data Imbalance

Severe class imbalance remains a major obstacle, biasing training toward the majority class and inflating accuracy while masking low sensitivity to early disease. In addition, datasets such as the UK Biobank are dominated by White European ancestry, limiting generalizability across racially and ethnically diverse populations. Addressing this requires both technical mitigation and proactive recruitment of underrepresented groups so models better reflect population heterogeneity.

Researchers have applied data-level interventions such as generative adversarial network–based augmentation, diffusion models, and resampling [123,137,138,145,146]; algorithm-level solutions, including cost-sensitive, loss-focused, ensemble, and class-weighted training schemes [147-152]; and evaluation-focused remedies [153] have been developed to mitigate biases.

Current methods frequently introduce new challenges, such as overfitting or inadequate performance in minority classes. Moreover, efforts to increase diversity remain inadequate. Future directions should focus on novel adaptive resampling methods, generative methods for synthetic minority data creation, and dedicated efforts to include and characterize underrepresented populations to ensure equitable and robust clinical applicability across diverse populations.

Lack of Standardized and Longitudinal Data

Differences in acquisition protocols and diagnostic criteria across datasets limit comparability of imaging, cognitive, and biomarker outcomes. Longitudinal evidence is also constrained: even in relatively standardized resources such as ADNI, limited long-term follow-up hampers modeling the temporal dynamics of disease progression.

Future work should standardize key acquisition elements and diagnostic criteria across longitudinal studies and strengthen coordination across institutions. Building on this, a multimodal benchmark spanning imaging, clinical, biomarker, behavioral, and linguistic modalities would enable cross-dataset validation, improve comparability, and support reproducible evaluation of new models. These steps would strengthen temporal modeling and provide more reliable evidence for clinical translation.

Dataset-Specific Limitations

Data imbalance is prevalent across many AD datasets, but the nature of this issue varies substantially between cohorts. This review, therefore, outlines the dataset-specific limitations of commonly used AD cohorts and corpora.

ADNI participants are generally healthier, with fewer comorbidities and a restricted age range (55‐90 y), limiting representativeness. Protocol differences across centers and evolving diagnostic standards introduce heterogeneity, while frequent reliance on subsets hampers comparability [154].

Of UK Biobank, dementia outcomes are derived mainly from health records, leading to potential misclassification and delayed ascertainment. Participants show strong volunteer bias, and PET or cerebrospinal fluid biomarkers are limited to a small subset, constraining multimodal analyses [155].

OASIS provides open neuroimaging data but with relatively small AD/MCI sample sizes and inconsistent modality coverage. Limited longitudinal depth and cross-scanner variability further reduce reproducibility [156].

Of NACC, data are aggregated from multiple centers with heterogeneous recruitment and diagnostic protocols, making harmonization challenging. The cohort is clinic-based rather than population-representative, and missing biomarker modalities are common [157].

Although high-quality, Australian Imaging, Biomarkers and Lifestyle Study is smaller than ADNI and NACC and is often used only for validation. Regional recruitment and protocol differences reduce ethnic diversity and cross-cohort comparability [158].

Of Pitt Corpus, this is the most widely used speech dataset, but remains small and imbalanced. Tasks are constrained, limiting ecological validity, and cross-linguistic generalizability is poor [159].

Of the ADReSS series, the ADReSS benchmarks provide standardized speech corpora but are modest in size and restricted to English. Narrow task design and small training partitions raise concerns of overfitting and limited external validity [18].

Of self-collected cohorts, locally collected datasets often involve small, single-site samples with heterogeneous acquisition protocols. Missing modalities, limited follow-up, and selection bias further restrict their generalizability [153].

Dataset challenges are compounded by unrepresentative cohorts, incomplete modalities, and poor cross-center consistency, limiting model robustness and cross-dataset generalization in AD diagnosis. Future work should improve data coordination and standardization, enable more practical sharing mechanisms, and adopt cross-cohort validation where feasible. Strengthening data quality and access is essential for translating multimodal AI methods into clinical use.

Model Interpretability and Explainability

A major limitation of multimodal ML models in clinical AD diagnosis is limited interpretability and transparency. Many high-performing models provide insufficient insight into their decision processes, which can hinder clinical adoption and reduce confidence among end users.

Efforts that have been made toward model interpretability include designing inherently transparent models. For example, some studies demonstrate emerging explainability strategies, including hybrid neuro-symbolic models [160] that generate interpretable reports and post hoc methods such as SHAP, LIME (Local Interpretable Model-Agnostic Explanations), gradient-based saliency, and graph-masking techniques [161,162], which collectively enhance transparency in multimodal AD diagnosis.

Current interpretability methods often fail to produce explanations that clinicians can use reliably. Future work should prioritize clinically grounded explainability, including interactive visualizations and concise workflow-aligned natural-language summaries. Hybrid designs that combine deep learning with structured reasoning can further improve transparency by making decision logic explicit. For deployment, models should also report prediction uncertainty and demonstrate compatibility with clinical systems and regulatory requirements.

Beyond technical advances, incorporating patient and public involvement can improve multimodal AI development for AD. Patients and caregivers can help shape evaluation and result communication, not just act as end users, aligning explanations with patient priorities and addressing transparency and fairness. Engaging patient and public involvement earlier in model design may therefore support more interpretable and clinically usable diagnostic tools.

Heterogeneous Multiview Learning Problem

Integrating data across studies is challenging because single datasets rarely cover all modalities, forcing combinations such as ADNI with UK Biobank. However, differences in cohorts, imaging protocols, and cognitive assessment frameworks create substantial heterogeneity that limits direct pooling and comparability.

This heterogeneity hinders building unified models that generalize across nonoverlapping cohorts, so single-dataset models often fail out of domain. Platform-agnostic methods that tolerate missing or inconsistent modalities are therefore needed. Proposed solutions include shared latent-space learning [163], multibranch networks [164], and mixture-of-experts architectures [165] to support partial fusion and cross-dataset adaptation, but most still assume strong cross-domain alignment or require substantial retraining under domain shift.

Despite recent progress, multimodal methods often assume strict cross-domain alignment and require extensive retraining under domain shift or missing modalities. Future work should develop robust, platform-agnostic frameworks that adapt to changing modality availability and distribution shifts with minimal performance loss and advance representation learning to derive stable joint embeddings from heterogeneous data.

Uncertainty Quantification and Clinical Applicability

Although multimodal AD models have advanced, most studies still omit uncertainty quantification (eg, confidence or prediction intervals). Models typically provide deterministic outputs without communicating reliability, despite clinicians relying on uncertainty to guide management and treatment decisions. Future work should embed uncertainty metrics into diagnostic models to better align with clinical needs and improve interpretability, reliability, and real-world adoption.

Risks of Data Leakage in Multimodal AI Modeling

Another limitation is data leakage, which can inflate performance. Common forms include subject-level leakage (samples from the same participant in both training and test sets), patch-level overlap in MRI slice and patch models, and transcript or utterance-level leakage in speech datasets when multiple segments come from 1 individual. Many studies did not report whether participant-independent splits were enforced. Clearer reporting of partitioning and rigorous participant-level cross-validation are therefore essential to ensure real-world generalizability.

Conclusions

This review synthesizes evidence on multimodal AI approaches for AD across clinical, neuroimaging, genetic, and linguistic data, systematically comparing modeling strategies, validation practices, and performance trends across heterogeneous datasets. In contrast to prior modality-specific reviews, the findings show that multimodal models generally outperform unimodal approaches, although performance varies substantially with dataset characteristics, modality availability, and cross-source alignment. High accuracies are often reported in curated or internally validated cohorts, whereas population-based and externally validated studies yield more modest but clinically realistic results, reflecting substantial heterogeneity and risk of bias.

Despite these limitations, the evidence demonstrates that multimodal AI captures complementary biological and behavioral signals relevant to AD, offering clear advantages for diagnosis and risk prediction. Transformer-based architectures and speech- or behavior-derived modalities show promise for scalable and noninvasive early detection. However, meaningful clinical translation will require harmonized benchmarking, transparent reporting, and rigorous external validation. Overall, this review advances the field by contextualizing performance gains within their methodological constraints and by outlining practical directions for developing robust, interpretable, and generalizable multimodal AI systems. These insights support the responsible integration of AI into real-world dementia screening, risk prediction, and early intervention strategies.

Acknowledgments

The authors declare the use of generative artificial intelligence (GenAI) in the research and writing process. According to the GAIDeT (Generative Artificial Intelligence for Digital Twins) taxonomy (2025), the following tasks were delegated to GenAI tools under full human supervision: proofreading and editing. The GenAI tool used was ChatGPT-5.2. Responsibility for the final manuscript lies entirely with the authors. Declaration submitted by: JMIR Publications. We used ChatGPT-5.2 (OpenAI) to conduct a grammatical review of the abstract and conclusion sections.

Funding

This work received no specific financial or nonfinancial support. No funder or sponsor had any role in the design of the review; data collection, analysis, or interpretation; writing of this paper; or the decision to submit for publication.

Data Availability

This systematic review did not generate any new datasets. All data extracted and analyzed in this systematic review were obtained from publicly available publications included in the review. No additional unpublished or proprietary data were used.

Conflicts of Interest

None declared.

Multimedia Appendix 1

Full database search strategies for PubMed, Scopus, IEEE Xplore, and ACM Digital Library, including complete Boolean queries, search fields, filters, and publication date limits used for study identification.

DOCX File, 999 KB

Multimedia Appendix 2

Performance evaluation for AD diagnosis. AD: Alzheimer disease.

DOCX File, 22 KB

Multimedia Appendix 3

Complete QUADAS-2 risk-of-bias assessments for all included studies, summarizing judgments across patient selection, index test, reference standard, and flow or timing, with detailed study-level ratings. QUADAS-2: Revised Quality Assessment of Diagnostic Accuracy Studies Tool.

DOCX File, 38 KB

Multimedia Appendix 4

Overview of traditional machine-learning models applied in Alzheimer disease research, including SVM, decision trees, HMMs, KNN, logistic regression, GMMs, and foundational CNN or RL descriptions, with methodological principles and limitations. CNN: convolutional neural network; GMM: gaussian mixture models; HMM: hidden markov model ; KNN: k-nearest neighbors; RL: reinforcement learning; SVM: support vector machine.

DOCX File, 42 KB

Multimedia Appendix 5

Cochrane Handbook 5.3.3–aligned data-extraction tables summarizing study design, datasets, participants, modalities, preprocessing, model architectures, validation schemes, outcomes, and limitations for all included studies.

DOCX File, 185 KB

Checklist 1

Completed PRISMA 2020, PRISMA-S checklist, and PRISMA expanded checklist specifying reporting locations for all required items, including eligibility criteria, search methods, extraction procedures, bias assessments, and synthesis reporting. PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses; PRISMA-S: Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for literature searches.

PDF File, 11026 KB

  1. Scheltens P, De Strooper B, Kivipelto M, et al. Alzheimer’s disease. Lancet. Apr 24, 2021;397(10284):1577-1590. [CrossRef] [Medline]
  2. 2024 Alzheimer’s disease facts and figures. Alzheimers Dement. May 2024;20(5):3708-3821. [CrossRef]
  3. World alzheimer report 2024. Alzheimer’s Disease International; 2024. URL: https://www.alzint.org/resource/world-alzheimer-report-2024/ [Accessed 2026-02-10]
  4. Kaštelan S, Gverović Antunica A, Puzović V, et al. Non-invasive retinal biomarkers for early diagnosis of Alzheimer’s disease. Biomedicines. Jan 24, 2025;13(2):283. [CrossRef] [Medline]
  5. Castellano G, Esposito A, Lella E, Montanaro G, Vessio G. Automated detection of Alzheimer’s disease: a multi-modal approach with 3D MRI and amyloid PET. Sci Rep. Mar 3, 2024;14(1):5210. [CrossRef] [Medline]
  6. Bi Y, Abrol A, Fu Z, Calhoun VD. A multimodal vision transformer for interpretable fusion of functional and structural neuroimaging data. Hum Brain Mapp. Dec 1, 2024;45(17):e26783. [CrossRef] [Medline]
  7. Yu Q, Ma Q, Da L, et al. A transformer-based unified multimodal framework for Alzheimer’s disease assessment. Comput Biol Med. Sep 2024;180:108979. [CrossRef] [Medline]
  8. Leng Y, He Y, Amini S, et al. A GPT-4o-powered framework for identifying cognitive impairment stages in electronic health records. npj Digit Med. Jul 3, 2025;8(1):401. [CrossRef]
  9. Balabin H, Tamm B, Spruyt L, et al. Natural language processing-based classification of early Alzheimer’s disease from connected speech. Alzheimer's Dement. Feb 2025;21(2):e14530. [CrossRef] [Medline]
  10. Yang X, Hong K, Zhang D, Wang K. Early diagnosis of Alzheimer’s disease based on multi-attention mechanism. In: Fati SM, editor. PLOS ONE. 2024;19(9):e0310966. [CrossRef]
  11. Wijeratne PA, Alexander DC. Learning transition times in event sequences: the event-based hidden markov model of disease progression. Inf Process Med Imaging. Jun 2021;12729(14):583-595. [CrossRef]
  12. Huh YJ, Park JH, Kim YJ, Kim KG. Ensemble learning-based Alzheimer’s disease classification using electroencephalogram signals and clock drawing test images. Sensors (Basel). May 2, 2025;25(9):2881. [CrossRef] [Medline]
  13. Karasu E, Baytaş İ. Conversion-aware forecasting of Alzheimer’s disease via featurewise attention. Pattern Anal Applic. Jun 2025;28(2):64. [CrossRef]
  14. Xiao X, Li Y, Wu Q, et al. Development and validation of a novel predictive model for dementia risk in middle-aged and elderly depression individuals: a large and longitudinal machine learning cohort study. Alz Res Therapy. May 13, 2025;17(1):103. [CrossRef]
  15. Qiu S, Miller MI, Joshi PS, et al. Multimodal deep learning for Alzheimer’s disease dementia assessment. Nat Commun. Jun 20, 2022;13(1):3404. [CrossRef] [Medline]
  16. Chakravarthi BA, Shivakanth G. Integrating multimodal AI techniques and MRI preprocessing for enhanced diagnosis of Alzheimer’s disease: clinical applications and research horizons. IEEE Access. 2025;13:63519-63531. [CrossRef]
  17. Elazab A, Wang C, Abdelaziz M, et al. Alzheimer’s disease diagnosis from single and multimodal data using machine and deep learning models: achievements and future directions. Expert Syst Appl. Dec 2024;255:124780. [CrossRef]
  18. Ding K, Chetty M, Noori Hoshyar A, Bhattacharya T, Klein B. Speech based detection of Alzheimer’s disease: a survey of AI techniques, datasets and challenges. Artif Intell Rev. Oct 12, 2024;57(12):325. [CrossRef]
  19. Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. Mar 29, 2021;372:n71. [CrossRef] [Medline]
  20. Rethlefsen ML, Kirtley S, Waffenschmidt S, et al. PRISMA-S: an extension to the PRISMA Statement for Reporting Literature Searches in Systematic Reviews. Syst Rev. Jan 26, 2021;10(1):39. [CrossRef] [Medline]
  21. Li T, Higgins JP, Deeks JJ. Collecting data. In: Cochrane Handbook for Systematic Reviews of Interventions. 2019:109-141. [CrossRef] ISBN: 978-1-119-53660-4
  22. Park B, Kim Y, Park J, et al. Integrating biomarkers from virtual reality and magnetic resonance imaging for the early detection of mild cognitive impairment using a multimodal learning approach: validation study. J Med Internet Res. Apr 17, 2024;26:e54538. [CrossRef] [Medline]
  23. Shah Z, Qi SA, Wang F, et al. Exploring language-agnostic speech representations using domain knowledge for detecting Alzheimer’s dementia. ICASSP 2023 - 2023 IEEE Int Conf Acoustics, Speech Signal Process (ICASSP). 2023:1-2. [CrossRef]
  24. Cortes C, Vapnik V. Support-vector networks. Mach Learn. Sep 1995;20(3):273-297. [CrossRef]
  25. Sharma A, Kaur S, Memon N, Jainul Fathima A, Ray S, Bhatt MW. Alzheimer’s patients detection using support vector machine (SVM) with quantitative analysis. Neurosci Inf. Nov 2021;1(3):100012. [CrossRef]
  26. Gao X, Liu H, Shi F, Shen D, Liu M. Brain status transferring generative adversarial network for decoding individualized atrophy in Alzheimer’s disease. IEEE J Biomed Health Inform. Oct 2023;27(10):4961-4970. [CrossRef] [Medline]
  27. Lazli L. Improved Alzheimer disease diagnosis with a machine learning approach and neuroimaging: case study development. JMIRx Med. Apr 21, 2025;6:e60866. [CrossRef] [Medline]
  28. Hossain F, Halder RK, Uddin MN. An integrated machine learning based adaptive error minimization framework for Alzheimer’s stage identification. Intell-Based Med. 2025;11:100243. [CrossRef]
  29. Fulkar B, Dhale T, Pacharaney U, Deshmukh S. Early detection of chronic diseases using machine and deep learning algorithms. 2025 4th Int Conf Sentiment Anal Deep Learn (ICSADL). 2025:1656-1661. [CrossRef]
  30. Sathiya A, Basha CH, S V, Sharmila P JJ, S P, Indhumathi R. Enhancing Alzheimer’s disease detection using optimized attribute selection and random forest classifier for improved accuracy. 2025 Int Conf Visual Anal Data Visualization (ICVADV). 2025:1174-1179. [CrossRef]
  31. Saleh AW, Gupta G, Khan SB, Alkhaldi NA, Verma A. An Alzheimer’s disease classification model using transfer learning Densenet with embedded healthcare decision support system. Decis Anal J. Dec 2023;9:100348. [CrossRef]
  32. Baucum M, Khojandi A, Papamarkou T. Hidden markov models as recurrent neural networks: an application to Alzheimer’s disease. 2021 IEEE 21st Int Conf Bioinf Bioeng (BIBE). 2021:1-6. [CrossRef]
  33. Cai Z, Zeng D, Marder KS, Honig LS, Wang Y. Dynamic classification of latent disease progression with auxiliary surrogate labels. arXiv. Preprint posted online on Dec 11, 2024. [CrossRef]
  34. Chen Y, Pham TD. Development of a brain MRI-based hidden Markov model for dementia recognition. Biomed Eng Online. 2013;12 Suppl 1(Suppl 1):S2. [CrossRef] [Medline]
  35. Vats NA, Yadavalli A, Gurugubelli K, Vuppala AK. Acoustic features, BERT model and their complementary nature for Alzheimer’s dementia detection. IC3 ’21. Aug 5, 2021:267-272. [CrossRef]
  36. Xiao R, Cui X, Qiao H, et al. Early diagnosis model of Alzheimer’s disease based on sparse logistic regression with the generalized elastic net. Biomed Signal Process Control. Apr 2021;66:102362. [CrossRef]
  37. Ablimit A, Botelho C, Abad A, Schultz T, Trancoso I. Exploring dementia detection from speech: cross corpus analysis. ICASSP 2022 - 2022 IEEE Int Conf Acoust, Speech Signal Proc (ICASSP). 2022:6472-6476. [CrossRef]
  38. Lahmiri S. Integrating convolutional neural networks, kNN, and Bayesian optimization for efficient diagnosis of Alzheimer’s disease in magnetic resonance images. Biomed Signal Process Control. Feb 2023;80:104375. [CrossRef]
  39. Suwalka D, Pandita D, Godse S, Patil RR, Salam Khan A, Kumar A. AI applications and simulation-based learning integrating future of nursing education. 2024 Int Conf Intell Innovative Pract Eng Manage (IIPEM). 2024:1-6. [CrossRef]
  40. Chaudhari A, Saratkar S, Thute T. AI-enhanced imaging techniques for understanding Alzheimer’s progression. 2025 Int Conf Mach Learn Auton Syst (ICMLAS). 2025:1174-1179. [CrossRef]
  41. Ango R, C KKR, Fatima S, Nag A. Brain connectivity analysis in Alzheimer’s disease using graph convolutional network. 2024 4th Int Conf Soft Comput Secur Appl (ICSCSA). 2024:133-139. [CrossRef]
  42. Chattopadhyay T, Joshy NA, Ozarkar SS, et al. Deep learning algorithms for Alzheimer’s disease detection based on diffusion MRI: tests in Indian and North American cohorts. Alzheimer’s Dementia. Dec 2024;20(S2):e089294. [CrossRef] [Medline]
  43. Ma D, Zhang H, Wang L. Editorial: deep learning methods and applications in brain imaging for the diagnosis of neurological and psychiatric disorders. Front Neurosci. 2024;18:1497417. [CrossRef] [Medline]
  44. Williams C, Anik FI, Hasan MM, et al. Advancing brain-computer interface closed-loop systems for neurorehabilitation: A systematic review of AI and machine learning innovations in biomedical engineering (preprint). JMIR Biomed Eng. Nov 5, 2025;10:e72218. [CrossRef] [Medline]
  45. Whiting PF, Rutjes AWS, Westwood ME, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. Oct 18, 2011;155(8):529-536. [CrossRef] [Medline]
  46. de Swart WK, Loog M, Krijthe JH. A comparative study of methods for dynamic survival analysis. Front Neurol. 2025;16:1504535. [CrossRef] [Medline]
  47. Kang MK, Hong KS, Yang D, Kim HK. Multi-scale neural networks classification of mild cognitive impairment using functional near-infrared spectroscopy. Biocybern Biomed Eng. Jan 2025;45(1):11-22. [CrossRef]
  48. Abir SI, et al. EEG functional connectivity and deep learning for automated diagnosis of Alzheimer’s disease and schizophrenia. JCSTS. Jan 26, 2025;7(1):82-99. [CrossRef]
  49. Sathish R, Muthukumar R, Dhivya K, Karthikkumar S. Deep learning and IoT-enabled framework for accurate classification and monitoring of alzheimer’s disease based on eeg signal analysis. 2025 Fifth Int Conf Adv Electr, Comput, Commun Sustainable Technol (ICAECT). 2025:1-8. [CrossRef]
  50. Dubey AK, Kapoor R, Saraswat M. Optimized machine learning for medical data analysis and disease prediction. 2024 Int Conf Artif Intell Emerging Tech (Global AI Summit. 2024:1282-1286. [CrossRef]
  51. K P, Chitla VB, Aftab A, Kamath S. LSTM-based assistance for people with Alzheimer’s disease. 2025 Int Conf Intell Innovative Tech Comput, Electr Electron (IITCEE). 2025:1-5. [CrossRef]
  52. Pan J, Fan Z, Smith GE, Guo Y, Bian J, Xu J. Federated learning with multi-cohort real-world data for predicting the progression from mild cognitive impairment to Alzheimer’s disease. Alzheimer's Dement. Apr 2025;21(4):e70128. [CrossRef] [Medline]
  53. Zuo Y, Zhang B, Dong Y, et al. Glypred: lysine glycation site prediction via CCU–LightGBM–BiLSTM framework with multi-head attention mechanism. J Chem Inf Model. Aug 26, 2024;64(16):6699-6711. [CrossRef]
  54. Zhu M, Xu Z, Zhang Q, Liu Y, Gu D, Xu SD. GCSTormer: gated swin transformer with channel weights for image denoising. Expert Syst Appl. Jul 2025;284:127924. [CrossRef]
  55. Han X, Xue R, Feng J, et al. Hypergraph foundation model for brain disease diagnosis. IEEE Trans Neural Netw Learning Syst. 2025;36(10):17702-17716. [CrossRef]
  56. Lu SY, Zhang YD, Yao YD. A regularized transformer with adaptive token fusion for Alzheimer’s disease diagnosis in brain magnetic resonance images. Eng Appl Artif Intell. Sep 2025;155:111058. [CrossRef]
  57. Li X, Zhu W, Qiu P, Dumitrascu OM, Youssef A, Wang Y. A BERT-style self-supervised learning CNN for disease identification from retinal images. arXiv. Preprint posted online on Apr 25, 2025. [CrossRef]
  58. Mahapatra C. Exploring advanced applications of artificial intelligence in neuropharmacology: a comprehensive overview. Biol Life Sci. Preprint posted online on May 8, 2025. [CrossRef]
  59. Ren H, Zheng Y, Li C, et al. Using machine learning to predict cognitive decline in older adults from the Chinese longitudinal healthy longevity survey: model development and validation study. JMIR Aging. Apr 30, 2025;8:e67437. [CrossRef] [Medline]
  60. Shah YAR, Qureshi SM, Qureshi HA, Shah SUR, Ahmad A, Shiwlani A. Advances in artificial intelligence and machine learning for neurodegenerative disease: a literature review. WJRR. Sep 5, 2024;19(3):4-18. [CrossRef]
  61. Fatima G, Ashiquzzaman A, Kim SS, Kim YR, Kwon HS, Chung E. Vascular and glymphatic dysfunction as drivers of cognitive impairment in Alzheimer’s disease: insights from computational approaches. Neurobiol Dis. May 2025;208:106877. [CrossRef] [Medline]
  62. Yang X, Dang X, Cai J, Li J, Wang X, Heng P. Temporal‐multimodal consistency alignment for Alzheimer’s cognitive assessment prediction. Med Phys Mex Symp Med Phys. Jun 2025;52(6):5064-5080. [CrossRef]
  63. Sadeghian R, Haider F, Fraser K, Tasaki S, Muniz-Terrera G. Editorial: methods in artificial intelligence for dementia 2024. Front Dement. 2024;3:1444825. [CrossRef] [Medline]
  64. Kale M, Wankhede N, Pawar R, et al. AI-driven innovations in Alzheimer’s disease: integrating early diagnosis, personalized treatment, and prognostic modelling. Ageing Res Rev. Nov 2024;101:102497. [CrossRef]
  65. UKbiobank. URL: https://ukbiobank.ac.uk [Accessed 2026-02-07]
  66. ADNI. URL: https://adni.loni.usc.edu [Accessed 2026-02-07]
  67. Open access series of imaging studies (OASIS). Washington University in St Louis. URL: https://sites.wustl.edu/oasisbrains/ [Accessed 2026-02-07]
  68. NACC. URL: https://naccdata.org/ [Accessed 2026-02-07]
  69. Framingham Heart Study. URL: https://www.framinghamheartstudy.org/ [Accessed 2026-02-07]
  70. aibl. URL: https://aibl.csiro.au [Accessed 2026-02-07]
  71. TalkBank. URL: https://dementia.talkbank.org/ [Accessed 2026-02-07]
  72. Gkoumas D, Wang B, Tsakalidis A, et al. A longitudinal multi-modal dataset for dementia monitoring and diagnosis. Lang Resour Eval. 2024;58(3):883-902. [CrossRef] [Medline]
  73. Xu T, Wang X, Lun X, Pan H, Wang Z. ADReFV: face video dataset based on human‐computer interaction for Alzheimer’s disease recognition. Comput Animation Virtual. Jan 2023;34(1):e2127. [CrossRef]
  74. GENCODE. URL: https://www.gencodegenes.org/ [Accessed 2026-02-07]
  75. Velazquez M, Lee Y. Multimodal ensemble model for Alzheimer’s disease conversion prediction from early mild cognitive impairment subjects. Comput Biol Med. Dec 2022;151(Pt A):106201. [CrossRef] [Medline]
  76. Zhang M, Cui Q, Lü Y, Yu W, Li W. A multimodal learning machine framework for Alzheimer’s disease diagnosis based on neuropsychological and neuroimaging data. Comput Ind Eng. Nov 2024;197:110625. [CrossRef]
  77. Fabietti M, Mahmud M, Lotfi A, et al. Early detection of Alzheimer’s disease from cortical and hippocampal local field potentials using an ensembled machine learning model. IEEE Trans Neural Syst Rehabil Eng. 2023;31:2839-2848. [CrossRef]
  78. Seifallahi M, Mehraban AH, Galvin JE, Ghoraani B. Alzheimer’s disease detection using comprehensive analysis of Timed Up and Go Test via Kinect V.2 camera and machine learning. IEEE Trans Neural Syst Rehabil Eng. 2022;30:1589-1600. [CrossRef] [Medline]
  79. Xue C, Kowshik SS, Lteif D, et al. AI-based differential diagnosis of dementia etiologies on multimodal data. Nat Med. Oct 2024;30(10):2977-2989. [CrossRef] [Medline]
  80. Shi J, Zheng X, Li Y, Zhang Q, Ying S. Multimodal neuroimaging feature learning with multimodal stacked deep polynomial networks for diagnosis of Alzheimer’s disease. IEEE J Biomed Health Inform. Jan 2018;22(1):173-183. [CrossRef]
  81. Allwright M, Mundell HD, McCorkindale AN, et al. Ranking the risk factors for Alzheimer’s disease; findings from the UK Biobank study. Aging Brain. 2023;3:100081. [CrossRef] [Medline]
  82. Gu Z, Liu S, Ma H, et al. Estimation of machine learning-based models to predict dementia risk in patients with atherosclerotic cardiovascular diseases: UK Biobank study. JMIR Aging. Feb 26, 2025;8:e64148. [CrossRef] [Medline]
  83. You J, Zhang YR, Wang HF, et al. Development of a novel dementia risk prediction model in the general population: a large, longitudinal, population-based machine-learning study. eClinicalMedicine. Nov 2022;53:101665. [CrossRef]
  84. Calvo N, McFall GP, Ramana S, et al. Associated risk and resilience factors of Alzheimer’s disease in women with early bilateral oophorectomy: data from the UK Biobank. J Alzheimers Dis. Nov 2024;102(1):119-128. [CrossRef] [Medline]
  85. Yi F, Yuan J, Somekh J, et al. Genetically supported targets and drug repurposing for brain aging: a systematic study in the UK Biobank. Sci Adv. Mar 14, 2025;11(11):eadr3757. [CrossRef] [Medline]
  86. Yousefzadeh N, Tran C, Ramirez-Zamora A, Chen J, Fang R, Thai MT. Neuron-level explainable AI for Alzheimer’s disease assessment from fundus images. Sci Rep. Apr 2, 2024;14(1):7710. [CrossRef] [Medline]
  87. Gong W, Bai S, Zheng YQ, Smith SM, Beckmann CF. Supervised phenotype discovery from multimodal brain imaging. IEEE Trans Med Imaging. Mar 2023;42(3):834-849. [CrossRef] [Medline]
  88. Lian C, Liu M, Pan Y, Shen D. Attention-guided hybrid network for dementia diagnosis with structural MR images. IEEE Trans Cybern. Apr 2022;52(4):1992-2003. [CrossRef] [Medline]
  89. Lian C, Liu M, Wang L, Shen D. Multi-task weakly-supervised attention network for dementia status estimation with structural MRI. IEEE Trans Neural Netw Learning Syst. Aug 2022;33(8):4056-4068. [CrossRef]
  90. Li H, Habes M, Wolk DA, Fan Y. Alzheimer’s disease neuroimaging initiative and the Australian Imaging Biomarkers and Lifestyle Study of Aging. A deep learning model for early prediction of Alzheimer’s disease dementia based on hippocampal magnetic resonance imaging data. Alzheimer’s Dementia. Aug 2019;15(8):1059-1070. [CrossRef] [Medline]
  91. Oh K, Yoon JS, Suk HI. Learn-explain-reinforce: counterfactual reasoning and its guidance to reinforce an Alzheimer’s disease diagnosis model. IEEE Trans Pattern Anal Mach Intell. Apr 2023;45(4):4843-4857. [CrossRef] [Medline]
  92. Lian C, Liu M, Zhang J, Shen D. Hierarchical fully convolutional network for joint atrophy localization and Alzheimer’s disease diagnosis using structural MRI. IEEE Trans Pattern Anal Mach Intell. Apr 2020;42(4):880-893. [CrossRef] [Medline]
  93. Avsec Ž, Agarwal V, Visentin D, et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat Methods. Oct 2021;18(10):1196-1203. [CrossRef]
  94. Yang L, Wang X, Guo Q, et al. For the Alzheimer’s disease neuroimaging initiative. deep learning based multimodal progression modeling for Alzheimer’s disease. Stat Biopharm Res. Jul 3, 2021;13:337-343. [CrossRef]
  95. Lee LY, Vaghari D, Burkhart MC, et al. Robust and interpretable AI-guided marker for early dementia prediction in real-world clinical settings. eClinicalMedicine. Aug 2024;74:102725. [CrossRef]
  96. Zhu W, Sun L, Huang J, Han L, Zhang D. Dual attention multi-instance deep learning for Alzheimer’s disease diagnosis with structural MRI. IEEE Trans Med Imaging. Sep 2021;40(9):2354-2366. [CrossRef]
  97. Zhang M, Cui Q, Lü Y, Li W. A feature-aware multimodal framework with auto-fusion for Alzheimer’s disease diagnosis. Comput Biol Med. Aug 2024;178:108740. [CrossRef] [Medline]
  98. Bi XA, Hu X, Wu H, Wang Y. Multimodal data analysis of Alzheimer’s disease based on clustering evolutionary random forest. IEEE J Biomed Health Inform. Oct 2020;24(10):2973-2983. [CrossRef] [Medline]
  99. Bi XA, Xing Z, Zhou W, Li L, Xu L. Pathogeny detection for mild cognitive impairment via weighted evolutionary random forest with brain imaging and genetic data. IEEE J Biomed Health Inform. Jul 2022;26(7):3068-3079. [CrossRef] [Medline]
  100. Hashmi A, Barukab O. Dementia classification using deep reinforcement learning for early diagnosis. Appl Sci (Basel). Jan 22, 2023;13(3):1464. [CrossRef]
  101. Wang Y, Gao R, Wei T, et al. Predicting long-term progression of Alzheimer’s disease using a multimodal deep learning model incorporating interaction effects. J Transl Med. Mar 11, 2024;22(1):265. [CrossRef]
  102. Hatami M, Yaghmaee F, Ebrahimpour R. Investigating the potential of reinforcement learning and deep learning in improving Alzheimer’s disease classification. Neurocomputing. Sep 2024;597:128119. [CrossRef]
  103. Tabarestani S, Aghili M, Eslami M, et al. A distributed multitask multimodal approach for the prediction of Alzheimer’s disease in a longitudinal study. Neuroimage. Feb 1, 2020;206:116317. [CrossRef] [Medline]
  104. Burkhart MC, Lee LY, Vaghari D, et al. Unsupervised multimodal modeling of cognitive and brain health trajectories for early dementia prediction. Sci Rep. May 10, 2024;14(1):10755. [CrossRef] [Medline]
  105. El-Sappagh S, Alonso JM, Islam SMR, Sultan AM, Kwak KS. A multilayer multimodal detection and prediction model based on explainable artificial intelligence for Alzheimer’s disease. Sci Rep. Jan 29, 2021;11(1):2660. [CrossRef] [Medline]
  106. Lee MW, Kim HW, Choe YS, et al. A multimodal machine learning model for predicting dementia conversion in Alzheimer’s disease. Sci Rep. May 29, 2024;14(1):12276. [CrossRef]
  107. Yuan S, Li H, Wu J, Sun X. Classification of mild cognitive impairment with multimodal data using both labeled and unlabeled samples. IEEE/ACM Trans Comput Biol and Bioinf. Nov 1, 2021;18(6):2281-2290. [CrossRef]
  108. Cirincione A, Lynch K, Bennett J, et al. Prediction of future dementia among patients with mild cognitive impairment (MCI) by integrating multimodal clinical data. Heliyon. Sep 15, 2024;10(17):e36728. [CrossRef] [Medline]
  109. Cassani R, Falk TH. Alzheimer’s disease diagnosis and severity level detection based on electroencephalography modulation spectral “patch” features. IEEE J Biomed Health Inform. Jul 2020;24(7):1982-1993. [CrossRef] [Medline]
  110. Cilia ND, D’Alessandro T, De Stefano C, Fontanella F, Molinara M. From online handwriting to synthetic images for Alzheimer’s disease detection using a deep transfer learning approach. IEEE J Biomed Health Inform. Dec 2021;25(12):4243-4254. [CrossRef]
  111. Kmetzsch V, Becker E, Saracino D, et al. Disease progression score estimation from multimodal imaging and MicroRNA data using supervised variational autoencoders. IEEE J Biomed Health Inform. Dec 2022;26(12):6024-6035. [CrossRef]
  112. Mengoudi K, Ravi D, Yong KXX, et al. Augmenting dementia cognitive assessment with instruction-less eye-tracking tests. IEEE J Biomed Health Inform. Nov 2020;24(11):3066-3075. [CrossRef] [Medline]
  113. Tsai H, Yang TW, Ou KH, Su TH, Lin C, Chou CF. Multimodal attention network for dementia prediction. IEEE J Biomed Health Inform. Nov 2024;28(11):6918-6930. [CrossRef]
  114. Wu EQ, Peng XY, Chen SD, Zhao XY, Tang ZR. Detecting Alzheimer’s dementia degree. IEEE Trans Cogn Dev Syst. Mar 2022;14(1):116-125. [CrossRef]
  115. Zhang H, Ni M, Yang Y, et al. Patch-based interpretable deep learning framework for Alzheimer’s disease diagnosis using multimodal data. Biomed Signal Process Control. Feb 2025;100:107085. [CrossRef]
  116. Fan CC, Yang H, Zhang C, et al. Graph reasoning module for Alzheimer’s disease diagnosis: a plug-and-play method. IEEE Trans Neural Syst Rehabil Eng. 2023;31:4773-4780. [CrossRef]
  117. Beebe-Wang N, Okeson A, Althoff T, Lee SI. Efficient and explainable risk assessments for imminent dementia in an aging cohort study. IEEE J Biomed Health Inform. Jul 2021;25(7):2409-2420. [CrossRef]
  118. Battineni G, Hossain MA, Chintalapudi N, et al. Improved Alzheimer’s disease detection by MRI using multimodal machine learning algorithms. Diagnostics (Basel). Nov 13, 2021;11(11):2103. [CrossRef]
  119. Nguyen H, Chu NN. An introduction to deep learning research for Alzheimer’s disease. IEEE Consumer Electron Mag. May 1, 2021;10(3):72-75. [CrossRef]
  120. Fan F, Song H, Jiang J, et al. Development and validation of a multimodal deep learning framework for vascular cognitive impairment diagnosis. iScience. Oct 2024;27(10):110945. [CrossRef]
  121. Ilias L, Askounis D, Psarras J. Detecting dementia from speech and transcripts using transformers. Comput Speech Lang. Apr 2023;79:101485. [CrossRef]
  122. Poor FF, Dodge HH, Mahoor MH. A multimodal cross-transformer-based model to predict mild cognitive impairment using speech, language and vision. Comput Biol Med. Nov 2024;182:109199. [CrossRef] [Medline]
  123. Lin K, Washington PY. Multimodal deep learning for dementia classification using text and audio. Sci Rep. Jun 16, 2024;14(1):13887. [CrossRef]
  124. Ortiz-Perez D, Ruiz-Ponce P, Tomás D, Garcia-Rodriguez J, Vizcaya-Moreno MF, Leo M. A deep learning-based multimodal architecture to predict signs of dementia. Neurocomputing. Sep 2023;548:126413. [CrossRef]
  125. Ilias L, Askounis D. Explainable identification of dementia from transcripts using transformer networks. IEEE J Biomed Health Inform. Aug 2022;26(8):4153-4164. [CrossRef] [Medline]
  126. Wen B, Wang N, Subbalakshmi K, Chandramouli R. Revealing the roles of part-of-speech taggers in Alzheimer disease detection: scientific discovery using one-intervention causal explanation. JMIR Form Res. May 2, 2023;7:e36590. [CrossRef] [Medline]
  127. Chen X, Pu Y, Li J, Zhang WQ. Cross-lingual Alzheimer’s disease detection based on paralinguistic and pre-trained features. ICASSP 2023 - 2023 IEEE Int Conf Acoustics, Speech Signal Proc (ICASSP). 2023:1-2. [CrossRef]
  128. Zheng C, Bouazizi M, Ohtsuki T. An evaluation on information composition in dementia detection based on speech. IEEE Access. 2022;10:92294-92306. [CrossRef]
  129. Nambiar AS, Likhita K, Pujya K, Gupta D, Vekkot S, Lalitha S. Comparative study of deep classifiers for early dementia detection using speech transcripts. 2022 IEEE 19th India Counc Int Conf (INDICON). 2022:1-6. [CrossRef]
  130. Priyadarshinee P, Clarke CJ, Melechovsky J, Lin CMY, B. T. B, Chen JM. Alzheimer’s dementia speech (audio vs. text): multi-modal machine learning at high vs. low resolution. Appl Sci (Basel). 2023;13(7):4244. [CrossRef]
  131. Liu J, Fu F, Li L, et al. Efficient pause extraction and encode strategy for Alzheimer’s disease detection using only acoustic features from spontaneous speech. Brain Sci. Mar 11, 2023;13(3):477. [CrossRef] [Medline]
  132. Mahajan P, Baths V. Acoustic and language based deep learning approaches for Alzheimer’s dementia detection from spontaneous speech. Front Aging Neurosci. 2021;13:623607. [CrossRef] [Medline]
  133. Mei K, Ding X, Liu Y, et al. The USTC system for ADReSS-M challenge. ICASSP 2023 - 2023 IEEE Int Conf Acoustics, Speech Signal Proc (ICASSP). 2023:1-2. [CrossRef]
  134. Ali Meerza SI, Li Z, Liu L, Zhang J, Liu J. Fair and privacy-preserving Alzheimer’s disease diagnosis based on spontaneous speech analysis via federated learning. 2022 44th Ann Int Conf IEEE Eng Med Biol Soc (EMBC). 2022:1362-1365. [CrossRef]
  135. Chen W, Xing X, Xu X, Pang J, Du L. SpeechFormer++: a hierarchical efficient framework for paralinguistic speech processing. IEEE/ACM Trans Audio Speech Lang Process. 2023;31:775-788. [CrossRef]
  136. Tamm B, Vandenberghe R, Van Hamme H. Cross-lingual transfer learning for alzheimer’s detection from spontaneous speech. ICASSP 2023 - 2023 IEEE Int Conf Acoust, Speech Signal Process (ICASSP). 2023:1-2. [CrossRef]
  137. Woszczyk D, Hedlikova A, Akman A, Demetriou S, Schuller B. Data augmentation for dementia detection in spoken language. Proc Interspeech 2022. 2022:2858-2862. [CrossRef]
  138. Jin L, Oh Y, Kim H, et al. CONSEN: complementary and simultaneous ensemble for Alzheimer’s disease detection and MMSE score prediction. ICASSP 2023 - 2023 IEEE Int Conf Acoustics, Speech Signal Proc (ICASSP). 2023:1-2. [CrossRef]
  139. Ilias L, Askounis D. Context-aware attention layers coupled with optimal transport domain adaptation and multimodal fusion methods for recognizing dementia from spontaneous speech. Knowl Based Syst. Oct 2023;277:110834. [CrossRef]
  140. Azevedo T, Bethlehem RAI, Whiteside DJ, et al. Identifying healthy individuals with Alzheimer’s disease neuroimaging phenotypes in the UK Biobank. Commun Med. Jul 20, 2023;3(1):100. [CrossRef]
  141. Liang S, Chen T, Ma J, Ren S, Lu X, Du W. Identification of mild cognitive impairment using multimodal 3D imaging data and graph convolutional networks. Phys Med Biol. Dec 7, 2024;69(23):235002. [CrossRef]
  142. Jahan S, Abu Taher K, Kaiser MS, et al. Explainable AI-based Alzheimer’s prediction and management using multimodal data. PLOS ONE. 2023;18(11):e0294253. [CrossRef] [Medline]
  143. Jahan S, Saif Adib M, Huda SM, et al. Federated explainable AI-based Alzheimer’s disease prediction with multimodal data. IEEE Access. 2025;13:43435-43454. [CrossRef]
  144. Myrzashova R, Alsamhi SH, Shvetsov AV, Hawbani A, Guizani M, Wei X. BCFTL: blockchain-enabled multimodal federated transfer learning for decentralized Alzheimer’s diagnosis. IEEE Internet Things J. 2025;12(15):29656-29669. [CrossRef]
  145. Chen K, Weng Y, Huang Y, et al. A multi‐view learning approach with diffusion model to synthesize FDG PET from MRI T1WI for diagnosis of Alzheimer’s disease. Alzheimers Dement. Feb 2025;21(2):e14421. [CrossRef]
  146. Lin W, Lin W, Chen G, et al. Bidirectional mapping of brain MRI and PET with 3D Reversible GAN for the diagnosis of Alzheimer’s disease. Front Neurosci. 2021;15:646013. [CrossRef] [Medline]
  147. Gupta B, Jegannathan GK, Alam MS, et al. Multimodal lightweight neural network for Alzheimer’s disease diagnosis integrating neuroimaging and cognitive scores. Neurosci Inf. Sep 2025;5(3):100218. [CrossRef]
  148. Chen Z, Wang Z, Zhao M, et al. A new classification network for diagnosing Alzheimer’s disease in class-imbalance MRI datasets. Front Neurosci. Aug 25, 2022;16:807085. [CrossRef]
  149. Sarma M, Chatterjee D. Multistage diagnosis of Alzheimer’s disease from clinical data using ‘deep ensemble learning’. JAIAI. 2024;01(1):122-138. [CrossRef]
  150. Mujahid M, Rehman A, Alam T, Alamri FS, Fati SM, Saba T. An efficient ensemble approach for Alzheimer’s disease detection using an adaptive synthetic technique and deep learning. Diagnostics (Basel). Jul 26, 2023;13(15):2489. [CrossRef] [Medline]
  151. Dubey Y, Bhongade A, Palsodkar P, Fulzele P. Efficient explainable models for Alzheimer’s disease classification with feature selection and data balancing approach using ensemble learning. Diagnostics (Basel). Dec 10, 2024;14(24):2770. [CrossRef] [Medline]
  152. Mandawkar U, Diwan T. Hybrid cuttle Fish-Grey wolf optimization tuned weighted ensemble classifier for Alzheimer’s disease classification. Biomed Signal Process Control. Jun 2024;92:106101. [CrossRef]
  153. Jasodanand VH, Kowshik SS, Puducheri S, et al. AI-driven fusion of multimodal data for Alzheimer’s disease biomarker assessment. Nat Commun. Aug 11, 2025;16(1):7407. [CrossRef] [Medline]
  154. Weiner MW, Kanoria S, Miller MJ, et al. Overview of Alzheimer’s disease neuroimaging Initiative and future clinical trials. Alzheimer's Dement. Jan 2025;21(1):e14321. [CrossRef] [Medline]
  155. Wilkinson T, Schnier C, Bush K, et al. Identifying dementia outcomes in UK Biobank: a validation study of primary care, hospital admissions and mortality data. Eur J Epidemiol. Jun 2019;34(6):557-565. [CrossRef]
  156. Thulasimani V, Shanmugavadivel K, Cho J, Easwaramoorthy SV. A review of datasets, optimization strategies, and learning algorithms for analyzing Alzheimer’s dementia detection. Neuropsychiatr Dis Treat. 2024;20:2203-2225. [CrossRef] [Medline]
  157. Chan KCG, Xia F, Kukull WA. NACC data: who is represented over time and across centers, and implications for generalizability. Alzheimer's Dement. Sep 2025;21(9):e70657. [CrossRef] [Medline]
  158. Fowler C, Rainey-Smith SR, Bird S, et al. Fifteen years of the Australian Imaging, Biomarkers and Lifestyle (AIBL) study: progress and observations from 2,359 older adults spanning the spectrum from cognitive normality to Alzheimer’s disease. J Alzheimer’s Dis Rep. Mar 11, 2021;5(1):443-468. [CrossRef]
  159. Yang Q, Li X, Ding X, Xu F, Ling Z. Deep learning-based speech analysis for Alzheimer’s disease detection: a literature review. Alz Res Therapy. Dec 14, 2022;14(1):186. [CrossRef]
  160. He Y, Wang Z, Zhang Y, et al. NeuroSymAD: a neuro-symbolic framework for interpretable Alzheimer’s disease diagnosis. Preprint posted online on Mar 1, 2025. [CrossRef]
  161. Sadeghi A, Hajati F, Argha A, Lovell NH, Yang M. Interpretable graph-based models on multimodal biomedical data integration: a technical review and benchmarking. arXiv. Preprint posted online on May 3, 2025. [CrossRef]
  162. Mahamud E, Assaduzzaman M, Islam J, Fahad N, Hossen MJ, Ramanathan TT. Enhancing Alzheimer’s disease detection: an explainable machine learning approach with ensemble techniques. Intell-Based Med. 2025;11:100240. [CrossRef]
  163. Zhou T, Liu M, Thung KH, Shen D. Latent representation learning for Alzheimer’s disease diagnosis with incomplete multi-modality neuroimaging and genetic data. IEEE Trans Med Imaging. Oct 2019;38(10):2411-2422. [CrossRef]
  164. Sharma R, Sibille L, Fahmi R. Multi‐branch convolutional neural network for Alzheimer’s disease versus normal control classification using PET images. Alzheimer’s Dementia. Jun 2023;19(S3):e061092. [CrossRef]
  165. Zhang J, Yu X, Chen T, et al. BrainNet-moe: brain-inspired mixture-of-experts learning for neurological disease identification. Preprint posted online on Mar 5, 2025. [CrossRef]


AD: Alzheimer disease
AdaBoost: Adaptive Boosting
ADNI: Alzheimer’s Disease Neuroimaging Initiative
ADReSS: Alzheimer’s Dementia Recognition Through Spontaneous Speech
ADReSSo: Alzheimer’s Dementia Recognition Through Spontaneous Speech 2021 Challenge
AI: artificial intelligence
ALBERT: A Lite Bidirectional Encoder Representations From Transformers
AUC: area under the curve
BERT: Bidirectional Encoder Representations From Transformers
BiLSTM: bidirectional long short-term memory
DeiT: Data-Efficient Image Transformers
DNN: DNN
EEG: electroencephalography
FL: federated learning
HIPAA : Health Insurance Portability and Accountability Act
LightGBM: Light Gradient-Boosting Machine
LIME: Local Interpretable Model-Agnostic Explanations
MCI: mild cognitive impairment
ML: machine learning
MRI: magnetic resonance imaging
NACC: National Alzheimer’s Coordinating Centre
OASIS: Open Access Series of Imaging Studies
PET: positron emission tomography
PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses
PRISMA-S: Preferred Reporting Items for Systematic Reviews and Meta-Analyses literature search extension
QUADAS-2: Revised Quality Assessment of Diagnostic Accuracy Studies tool
RL: reinforcement learning
RoBERTa+: Robustly Optimized Bidirectional Encoder Representations From Transformers Approach
SHAP: Shapley Additive Explanations
ViT: vision transformer
XGBoost: Extreme Gradient Boosting


Edited by Stefano Brini; submitted 07.Oct.2025; peer-reviewed by Farah Elkourdi, Mohammad Mamun Sikder, Oladayo Oyetunji, Shan Jiang; final revised version received 09.Jan.2026; accepted 09.Jan.2026; published 25.Mar.2026.

Copyright

© Ziwen Yu, Anthony Mulholland, Tianyan Huang, Qiang Liu. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 25.Mar.2026.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.